upgrade structures and migrate to nextra v4

2025-07-06 12:40:25 -05:00
parent 76e50de44d
commit 717520624d
317 changed files with 18143 additions and 22777 deletions
--- a/content/CSE559A/CSE559A_L8.md
+++ b/content/CSE559A/CSE559A_L8.md
@@ -0,0 +1,80 @@
+# CSE559A Lecture 8
+
+Paper review sharing.
+
+## Recap: Three ways to think about linear classifiers
+
+Geometric view: Hyperplanes in the feature space
+
+Algebraic view: Linear functions of the features
+
+Visual view: One template per class
+
+## Continue on linear classification models
+
+Two layer networks as combination of templates.
+
+Interpretability is lost during the depth increase.
+
+A two layer network is a **universal approximator** (we can approximate any continuous function to arbitrary accuracy). But the hidden layer may need to be huge.
+
+[Multi-layer networks demo](https://playground.tensorflow.org)
+
+### Supervised learning outline
+
+1. Collect training data
+2. Specify model (select hyper-parameters)
+3. Train model
+
+#### Hyper-parameters selection
+
+- Number of layers, number of units per layer, learning rate, etc.
+- Type of non-linearity, regularization, etc.
+- Type of loss function, etc.
+- SGD settings: batch size, number of epochs, etc.
+
+#### Hyper-parameter searching
+
+Use validation set to evaluate the performance of the model.
+
+Never peek the test set.
+
+Use the training set to do K-fold cross validation.
+
+### Backpropagation
+
+#### Computation graphs
+
+SGD update for each parameter
+
+$$
+w_k\gets w_k-\eta\frac{\partial e}{\partial w_k}
+$$
+
+$e$ is the error function.
+
+#### Using the chain rule
+
+Suppose $k=1$, $e=l(f_1(x,w_1),y)$
+
+Example: $e=(f_1(x,w_1)-y)^2$
+
+So $h_1=f_1(x,w_1)=w^T_1x$, $e=l(h_1,y)=(y-h_1)^2$
+
+$$
+\frac{\partial e}{\partial w_1}=\frac{\partial e}{\partial h_1}\frac{\partial h_1}{\partial w_1}
+$$
+
+$$
+\frac{\partial e}{\partial h_1}=2(h_1-y)
+$$
+
+$$
+\frac{\partial h_1}{\partial w_1}=x
+$$
+
+$$
+\frac{\partial e}{\partial w_1}=2(h_1-y)x
+$$
+
+#### General backpropagation algorithm