upgrade structures and migrate to nextra v4
This commit is contained in:
80
content/CSE559A/CSE559A_L8.md
Normal file
80
content/CSE559A/CSE559A_L8.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# CSE559A Lecture 8
|
||||
|
||||
Paper review sharing.
|
||||
|
||||
## Recap: Three ways to think about linear classifiers
|
||||
|
||||
Geometric view: Hyperplanes in the feature space
|
||||
|
||||
Algebraic view: Linear functions of the features
|
||||
|
||||
Visual view: One template per class
|
||||
|
||||
## Continue on linear classification models
|
||||
|
||||
Two layer networks as combination of templates.
|
||||
|
||||
Interpretability is lost during the depth increase.
|
||||
|
||||
A two layer network is a **universal approximator** (we can approximate any continuous function to arbitrary accuracy). But the hidden layer may need to be huge.
|
||||
|
||||
[Multi-layer networks demo](https://playground.tensorflow.org)
|
||||
|
||||
### Supervised learning outline
|
||||
|
||||
1. Collect training data
|
||||
2. Specify model (select hyper-parameters)
|
||||
3. Train model
|
||||
|
||||
#### Hyper-parameters selection
|
||||
|
||||
- Number of layers, number of units per layer, learning rate, etc.
|
||||
- Type of non-linearity, regularization, etc.
|
||||
- Type of loss function, etc.
|
||||
- SGD settings: batch size, number of epochs, etc.
|
||||
|
||||
#### Hyper-parameter searching
|
||||
|
||||
Use validation set to evaluate the performance of the model.
|
||||
|
||||
Never peek the test set.
|
||||
|
||||
Use the training set to do K-fold cross validation.
|
||||
|
||||
### Backpropagation
|
||||
|
||||
#### Computation graphs
|
||||
|
||||
SGD update for each parameter
|
||||
|
||||
$$
|
||||
w_k\gets w_k-\eta\frac{\partial e}{\partial w_k}
|
||||
$$
|
||||
|
||||
$e$ is the error function.
|
||||
|
||||
#### Using the chain rule
|
||||
|
||||
Suppose $k=1$, $e=l(f_1(x,w_1),y)$
|
||||
|
||||
Example: $e=(f_1(x,w_1)-y)^2$
|
||||
|
||||
So $h_1=f_1(x,w_1)=w^T_1x$, $e=l(h_1,y)=(y-h_1)^2$
|
||||
|
||||
$$
|
||||
\frac{\partial e}{\partial w_1}=\frac{\partial e}{\partial h_1}\frac{\partial h_1}{\partial w_1}
|
||||
$$
|
||||
|
||||
$$
|
||||
\frac{\partial e}{\partial h_1}=2(h_1-y)
|
||||
$$
|
||||
|
||||
$$
|
||||
\frac{\partial h_1}{\partial w_1}=x
|
||||
$$
|
||||
|
||||
$$
|
||||
\frac{\partial e}{\partial w_1}=2(h_1-y)x
|
||||
$$
|
||||
|
||||
#### General backpropagation algorithm
|
||||
Reference in New Issue
Block a user