This commit is contained in:
Zheyuan Wu
2025-02-08 15:31:49 -06:00
parent b749de07c2
commit d99e9fcfdd
6 changed files with 333 additions and 8 deletions

View File

@@ -0,0 +1,87 @@
# Lecture 8
Paper review sharing.
## Recap: Three ways to think about linear classifiers
Geometric view: Hyperplanes in the feature space
Algebraic view: Linear functions of the features
Visual view: One template per class
## Continue on linear classification models
Two layer networks as combination of templates.
Interpretability is lost during the depth increase.
A two layer network is a **universal approximator** (we can approximate any continuous function to arbitrary accuracy). But the hidden layer may need to be huge.
[Multi-layer networks demo](https://playground.tensorflow.org)
### Supervised learning outline
1. Collect training data
2. Specify model (select hyper-parameters)
3. Train model
#### Hyper-parameters selection
- Number of layers, number of units per layer, learning rate, etc.
- Type of non-linearity, regularization, etc.
- Type of loss function, etc.
- SGD settings: batch size, number of epochs, etc.
#### Hyper-parameter searching
Use validation set to evaluate the performance of the model.
Never peek the test set.
Use the training set to do K-fold cross validation.
### Backpropagation
#### Computation graphs
SGD update for each parameter
$$
w_k\gets w_k-\eta\frac{\partial e}{\partial w_k}
$$
$e$ is the error function.
#### Using the chain rule
Suppose $k=1$, $e=l(f_1(x,w_1),y)$
Example: $e=(f_1(x,w_1)-y)^2$
So $h_1=f_1(x,w_1)=w^T_1x$, $e=l(h_1,y)=(y-h_1)^2$
$$
\frac{\partial e}{\partial w_1}=\frac{\partial e}{\partial h_1}\frac{\partial h_1}{\partial w_1}
$$
$$
\frac{\partial e}{\partial h_1}=2(h_1-y)
$$
$$
\frac{\partial h_1}{\partial w_1}=x
$$
$$
\frac{\partial e}{\partial w_1}=2(h_1-y)x
$$
#### General backpropagation algorithm

View File

@@ -10,4 +10,5 @@ export default {
CSE559A_L5: "Computer Vision (Lecture 5)",
CSE559A_L6: "Computer Vision (Lecture 6)",
CSE559A_L7: "Computer Vision (Lecture 7)",
CSE559A_L8: "Computer Vision (Lecture 8)",
}

View File

@@ -1 +1,110 @@
# Lecture 11
# Lecture 11
## Recap
$$
I(x)=\begin{cases}
0 & x\leq 0 \\
1 & x>0
\end{cases}
$$
## Continue on Chapter 6
### The step function
#### Theorem 6.16
If $\sum c_n$ converges and $\{s_n\}$ is a sequence of distinct elements of $(a,b)$, and $f$ is continuous on $[a,b]$, and $\alpha(x)=\sum_{n=1}^{\infty}c_nI(x-s_n)$, then $\int_a^bf \ d\alpha=\sum_{n=1}^{\infty}c_nf(s_n)$.
Proof:
For each $x$, $I(x-s_n)\leq 1$ so $\sum_{n=1}^{\infty}c_nI(x-s_n)\leq \sum_{n=1}^{\infty}c_n$ converges (by comparison test).
Let $\epsilon>0$. We can find $N$ such that $\sum_{n=N+1}^{\infty}c_n<\epsilon$. (Recall that the series $\sum_{n=1}^{\infty}c_n$ converges if and only if $\lim_{N\to\infty}\sum_{n=1}^{N}c_n$ exists.)
Set $\alpha_1(x)=\sum_{n=1}^{N}c_nI(x-s_n)$, and $\alpha_2(x)=\sum_{n=N+1}^{\infty}c_nI(x-s_n)$.
Using the linearity of the integral, we have
$$
\int_a^b f\ d\alpha_1= \sum_{n=1}^{N}c_n\int_a^b fd(I(x-s_n))= \sum_{n=1}^{N}c_nf(s_n)
$$
On the other hand, with $M=\sup|f|$,
$$
\left|\int_a^b f\ d\alpha_2\right|\leq \int_a^b |f|\ d\alpha_2\leq M\int_a^b \alpha_2\ dx=M\sum_{n=N+1}^{\infty}c_n(b-s_n)<\epsilon
$$
So,
$$
\begin{aligned}
\left|\int_a^b f\ d\alpha-\sum_{n=1}^{\infty}c_nf(s_n)\right|&= \left|\int_a^b f\ d\alpha_2-\sum_{n=N+1}^{\infty}c_nf(s_n)\right|\\
&\leq |M\epsilon-\sum_{n=N+1}^{\infty}|c_n|M(b-s_n)|\\
&<2M\epsilon
\end{aligned}
$$
Since $\epsilon$ is arbitrary, we have $\int_a^b f\ d\alpha=\sum_{n=1}^{\infty}c_nf(s_n)$.
### Integration and differentiation
#### Theorem 6.20 Fundamental theorem of calculus
Let $f\in \mathscr{R}$ for $x\in [a,b]$. We define $F(x)=\int_a^x f(t)\ dt$. Then $F$ is continuous and if $f$ is continuous at $x_0\in [a,b]$, then $F$ is differentiable at $x_0$ and $F'(x_0)=f(x_0)$.
Proof:
Let $x<y,x,y\in [a,b]$. Then,
$$
|F(y)-F(x)|=\left|\int_a^y f(t)\ dt-\int_a^x f(t)\ dt\right|=\left|\int_x^y f(t)\ dt\right|\leq \sup|f|\cdot (y-x)
$$
So, $F$ is continuous on $[a,b]$.
Now, let $f$ be continuous at $x_0\in (a,b)$ and $\epsilon>0$. Then we can find $\delta>0$ such that $a<x_0-\delta<s<x_0<t<x_0+\delta<b$ and $|f(u)-f(x_0)|<\epsilon$ for all $u\in (x_0-\delta,x_0+\delta)$.
So,
$$
\begin{aligned}
\left|\frac{F(s)-F(x_0)}{s-x_0}-f(x_0)\right|&=\left|\frac{1}{s-x_0}\left(\int_a^s f(u)\ du-\int_a^{x_0} f(u)\ du\right)-f(x_0)\right|\\
&=\left|\frac{1}{s-x_0}\left(\int_s^{x_0} f(u)\ du\right)-\frac{1}{s-x_0}\left(\int_s^{x_0} f(x_0)\ dv\right)-f(x_0)\right|\\
&=\left|\frac{1}{x_0-s}\left(\int_s^{x_0} [f(u)-f(x_0)]\ du\right)\right|\\
&\leq \frac{1}{x_0-s}\epsilon(x_0-s)\\
&= \epsilon
\end{aligned}
$$
EOP
If $f\in \mathscr{R}$, and there exists a differentiable function $F:[a,b]\to \mathbb{R}$ such that $F'=f$ on $(a,b)$, then
$$
\int_a^b f(x)\ dx=F(b)-F(a)
$$
Proof:
Let $\epsilon>0$ and $P=\{x_0,x_1,\cdots,x_n\}$ be a partition of $[a,b]$.
By the mean value theorem, on each subinterval $[x_{i-1},x_i]$, there exists $t_i\in (x_{i-1},x_i)$ such that
$$
F(x_i)-F(x_{i-1})=F'(t_i)(x_i-x_{i-1})=f(t_i)\Delta x_i
$$
Since $m_i\leq f(t_i)\leq M_i$, notices that $\sum_{i=1}^n f(t_i)\Delta x_i=F(b)-F(a)$.
So,
$$
L(P,f)\leq F(b)-F(a)\leq U(P,f)
$$
So, $f\in \mathscr{R}$ and $\int_a^b f(x)\ dx=F(b)-F(a)$.
EOP

View File

@@ -130,21 +130,23 @@ We know $c_n r^ne^{in\theta}\to 0$ as $n\to\infty$.
So there exists $M\geq|c_n r^ne^{in\theta}|$ for all $n\in\mathbb{N}$.
So $\forall \zeta\in\overline{B(0,r)}$, $|c_n\zeta^n|\leq |c_n|e^n\leq M(\frac{e}{r})^n$.
So $\forall \zeta\in\overline{B(0,r)}$, $|c_n\zeta^n|\leq |c_n| |\zeta|^n \leq M \left(\frac{|\zeta|}{r}\right)^n$.
So $\sum_{n=0}^{\infty}|c_n\zeta^n|$ converges absolutely.
So the series converges absolutely and uniformly on $\overline{B(0,r)}$.
_Some steps are omitted._
If $|\zeta| > r$, then $|c_n \zeta^n|$ does not tend to zero, and the series diverges.
EOP
There are few cases for the convergence of the power series.
#### Possible Cases for the Convergence of Power Series
Let $E=\{\zeta\in\mathbb{C}: \sum_{n=0}^{\infty}c_n(\zeta-\zeta_0)^n\text{ converges}\}$.
1. **Convergence Only at $\zeta = 0$**:
- **Proof**: If the power series $\sum_{n=0}^{\infty} c_n (\zeta - \zeta_0)^n$ converges only at $\zeta = 0$, it means that the radius of convergence $R = 0$. This occurs when the terms $c_n (\zeta - \zeta_0)^n$ do not tend to zero for any $\zeta \neq 0$. The series diverges for all $\zeta \neq 0$ because the terms grow without bound.
1. It cloud only converge at $\zeta=0$.
2. It could converge everywhere.
2. **Convergence Everywhere**:
- **Proof**: If the power series converges for all $\zeta \in \mathbb{C}$, the radius of convergence $R = \infty$. This implies that the terms $c_n (\zeta - \zeta_0)^n$ tend to zero for all $\zeta$. This can happen if the coefficients $c_n$ decrease rapidly enough, such as in the exponential series.
Continue next time.
3. **Convergence Within a Finite Radius**:
- **Proof**: For a power series with a finite radius of convergence $R$, the series converges absolutely and uniformly for $|\zeta - \zeta_0| < R$ and diverges for $|\zeta - \zeta_0| > R$. On the boundary $|\zeta - \zeta_0| = R$, the series may converge or diverge depending on the specific series. This is determined by the behavior of the terms on the boundary.

125
pages/Math416/Math416_L8.md Normal file
View File

@@ -0,0 +1,125 @@
# Lecture 8
## Review
### Sequences of Functions
Let $f_n: G \to \mathbb{C}$ be a sequence of functions.
#### Convergence Pointwise
Definition:
Let $\zeta\in G$, $\forall \epsilon > 0$, $\exists N \in \mathbb{N}$ such that $\forall n \geq N$, $|f_n(\zeta) - f(\zeta)| < \epsilon$.
#### Convergence Uniformly
Definition:
$\forall \epsilon > 0$, $\forall \zeta\in G$, $\exists N \in \mathbb{N}$ such that $\forall n \geq N$, $|f_n(\zeta) - f(\zeta)| < \epsilon$.
#### Convergence Locally Uniformly
Definition:
$\forall \epsilon > 0$, $\forall \zeta\in G$, $\exists N \in \mathbb{N}$ such that $\forall n \geq N$, $|f_n(\zeta) - f(\zeta)| < \epsilon$.
#### Convergence Uniformly on Compact Sets
Definition: $\forall C\subset G$ that is compact, $\forall \epsilon > 0, \exists N \in \mathbb{N} \text{ s.t. } \forall n \geq N, \forall \zeta\in C, |f_n(\zeta) - f(\zeta)| < \epsilon$
#### Power Series
Definition:
$$
\sum_{n=0}^{\infty} c_n (\zeta - \zeta_0)^n
$$
$\zeta_0$ is the center of the power series.
#### Theorem of Power Seriess
If a power series converges at $\zeta_1$, then it converges absolutely at every point of $\overline{B(0,r)}$ that is strictly inside the disk of convergence.
## Continue on Power Series
### Limits of Power Series
#### Theorem 5.12
Cauchy-Hadamard Theorem:
The radius of convergence of the power series is given by $\sum_{n=0}^{\infty} a_n (\zeta - \zeta_0)^n$ is given by
$$
\frac{1}{R} = \limsup_{n\to\infty} |a_n|^{1/n}
$$
Proof:
Suppose $(b_n)^{\infty}_{n=0}$ is a sequence of real numbers such that $\lim_{n\to\infty} b_n$ may nor may not exists by $(-1)^n(1-\frac{1}{n})$.
The limit superior of $(b_n)$ is defined as
$$
s_n = \sup_{k\geq n} b_k
$$
$s_n$ is a decreasing sequence, by completeness of $\mathbb{R}$, every bounded sequence has a limit in $\mathbb{R}$.
So $s_n$ converges to some limit $s\in\mathbb{R}$.
Without loss of generality, this also holds for infininum of $s_n$.
Forward direction:
We want to show that the radius of convergence of $\sum_{n=0}^{\infty} a_n (\zeta - \zeta_0)^n$ is greater than or equal to $\frac{1}{\limsup_{n\to\infty} |a_n|^{1/n}}$.
Since $\sum_{n=0}^{\infty} 1\zeta^n=\frac{1}{1-\zeta}$ for $|\zeta|<1$. Assume $\limsup_{n\to\infty} |a_n|^{1/n}$ is finite, then $\sum_{n=0}^{\infty} a_n (\zeta - \zeta_0)^n$ converges absolutely at $\zeta_0$.
Let $\rho>\limsup_{n\to\infty} |a_n|^{1/n}$, then $\exists N \in \mathbb{N}$ such that $\forall n \geq N$, $|a_n|^{1/n}\leq \rho$.
So $\frac{1}{R}=\limsup_{n\to\infty} |a_n|^{1/n}<\rho$
So $R>\frac{1}{\rho}$
/*TRACK LOST*/
Backward direction:
Suppose $|\zeta|>R$, then $\exists$ number $|\zeta|$ such that $|\zeta|>\frac{1}{\rho}>R$.
So $\rho<\limsup_{n\to\infty} |a_n|^{1/n}$
This means that $\exists$ infinitely many $n_j$s such that $|a_{n_j}|^{1/n_j}>\rho$
So $|a_{n_j}\zeta^{n_j}|>\rho^{n_j}|\zeta|^{n_j}$
Series $\sum_{n=1}^{\infty} a_n\zeta^n$ diverges, each individual term is not going to $0$.
So $\sum_{n=0}^{\infty} a_n (\zeta - \zeta_0)^n$ does not converge at $\zeta$
EOP
_What if $|\zeta-\zeta_0|=R$?_
For $\sum_{n=0}^{\infty} \zeta^n$, the radius of convergence is $1$.
It diverges eventually on the circle of convergence.
For $\sum_{n=0}^{\infty} \frac{1}{(n+1)^2}\zeta^n$, the radius of convergence is $1$.
This converges everywhere on the circle of convergence.
For $\sum_{n=0}^{\infty} \frac{1}{n+1}\zeta^n$, the radius of convergence is $1$.
This diverges at $\zeta=1$ (harmonic series) and converges at $\zeta=-1$ (alternating harmonic series).
#### Theorem 5.15
Suppose $\sum_{n=0}^{\infty} a_n (\zeta - \zeta_0)^n$ has a positive radius of convergence $R$. Define $f(\zeta)=\sum_{n=0}^{\infty} a_n (\zeta - \zeta_0)^n$, then $f$ is holomorphic on $B(0,R)$ and $f'(\zeta)=\sum_{n=1}^{\infty} n a_n (\zeta - \zeta_0)^{n-1}=\sum_{k=0}^{\infty} (k+1)a_{k+1} (\zeta - \zeta_0)^k$.
Proof:
/*TRACK LOST*/

View File

@@ -10,4 +10,5 @@ export default {
Math416_L5: "Complex Variables (Lecture 5)",
Math416_L6: "Complex Variables (Lecture 6)",
Math416_L7: "Complex Variables (Lecture 7)",
Math416_L8: "Complex Variables (Lecture 8)",
}