This commit is contained in:
Zheyuan Wu
2025-02-04 15:14:00 -06:00
parent 1ee87c63e7
commit 8d909a3133
2 changed files with 229 additions and 0 deletions

228
pages/CSE559A/CSE559A_L7.md Normal file
View File

@@ -0,0 +1,228 @@
# Lecture 7
## Computer Vision (In Artificial Neural Networks for Image Understanding)
Early example of image understanding using Neural Networks: [Back propagation for zip code recognition]
Central idea; representation change, on each layer of feature.
Plan for next few weeks:
1. How do we train such models?
2. What are those building blocks
3. How should we combine those building blocks?
## How do we train such models?
CV is finally useful...
1. Image classification
2. Image segmentation
3. Object detection
ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
- 1000 classes
- 1.2 million images
- 10000 test images
### Deep Learning (Just neural networks)
Bigger datasets, larger models, faster computers, lots of incremental improvements.
```python
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:]
num_features = 1
for s in size:
num_features *= s
return num_features
# create pytorch dataset and dataloader
dataset = torch.utils.data.TensorDataset(torch.randn(1000, 1, 28, 28), torch.randint(10, (1000,)))
dataloader = torch.utils.data.DataLoader(dataset, batch_size=4, shuffle=True, num_workers=2)
# training process
net = Net()
optimizer = optim.Adam(net.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
# loop over the dataset multiple times
for epoch in range(2):
for i, data in enumerate(dataloader, 0):
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
print(f"Finished Training")
```
Some generated code above.
### Supervised Learning
Training: given a dataset, learn a mapping from input to output.
Testing: given a new input, predict the output.
Example: Linear classification models
Find a linear function that separates the data.
$$
f(x) = w^T x + b
$$
[Linear classification models](http://cs231n.github.io/linear-classify/)
Simple representation of a linear classifier.
### Empirical loss minimization framework
Given a training set, find a model that minimizes the loss function.
Assume iid samples.
Example of loss function:
l1 loss:
$$
\ell(f(x; w), y) = |f(x; w) - y|
$$
l2 loss:
$$
\ell(f(x; w), y) = (f(x; w) - y)^2
$$
### Linear classification models
$$
\hat{L}(w) = \frac{1}{n} \sum_{i=1}^n \ell(f(x_i; w), y_i)
$$
hard to find the global minimum.
#### Linear regression
However, if we use l2 loss, we can find the global minimum.
$$
\hat{L}(w) = \frac{1}{n} \sum_{i=1}^n (f(x_i; w) - y_i)^2
$$
This is a convex function, so we can find the global minimum.
The gradient is:
$$
\nabla_w||Xw-Y||^2 = 2X^T(Xw-Y)
$$
Set the gradient to 0, we get:
$$
w = (X^T X)^{-1} X^T Y
$$
From the maximum likelihood perspective, we can also derive the same result.
#### Logistic regression
Sigmoid function:
$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$
The loss of logistic regression is not convex, so we cannot find the global minimum using normal equations.
#### Gradient Descent
Full batch gradient descent:
$$
w \leftarrow w - \eta \nabla_w \hat{L}(w)
$$
Stochastic gradient descent:
$$
w \leftarrow w - \eta \nabla_w \hat{L}(w; x_i, y_i)
$$
Mini-batch gradient descent:
$$
w \leftarrow w - \eta \nabla_w \hat{L}(w; x_i, y_i)
$$
Mini-batch Gradient Descent:
$$
w \leftarrow w - \eta \nabla_w \hat{L}(w; x_i, y_i)
$$
at each step, we update the weights using the average gradient of the mini-batch.
the mini-batch is selected randomly from the training set.
#### Multi-class classification
Use softmax function to convert the output to a probability distribution.
## Neural Networks
From linear to non-linear.
- Shadow approach:
- Use feature transformation to make the data linearly separable.
- Deep approach:
- Stack multiple layers of linear models.
Common non-linear functions:
- ReLU:
- $$
\text{ReLU}(x) = \max(0, x)
$$
- Sigmoid:
- $$
\text{Sigmoid}(x) = \frac{1}{1 + e^{-x}}
$$
- Tanh:
- $$
\text{Tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
$$
### Backpropagation

View File

@@ -9,4 +9,5 @@ export default {
CSE559A_L4: "Computer Vision (Lecture 4)",
CSE559A_L5: "Computer Vision (Lecture 5)",
CSE559A_L6: "Computer Vision (Lecture 6)",
CSE559A_L7: "Computer Vision (Lecture 7)",
}