79 lines
2.3 KiB
Markdown
79 lines
2.3 KiB
Markdown
# CSE510 Deep Reinforcement Learning (Lecture 8)
|
|
|
|
## Convolutional Neural Networks
|
|
|
|
Another note in computer vision can be found here: [CSE559A Lecture 10](../CSE559A/CSE559A_L10#convolutional-layer)
|
|
|
|
Basically, it is a stack of different layers:
|
|
|
|
- Convolutional layer
|
|
- Non-linearity layer
|
|
- Pooling layer (or downsampling layer)
|
|
- Fully connected layer
|
|
|
|
### Convolutional layer
|
|
|
|
Filtering: The math behind the matching.
|
|
|
|
1. Line up the feature and the image patch.
|
|
2. Multiply each image pixel by the corresponding feature pixel.
|
|
3. Add them up.
|
|
4. Divide by the total number of pixels in the feature.
|
|
|
|
Idea of a convolutional neural network, in some sense, is to let the network "learn" the right filters for a specific task.
|
|
|
|
### Non-linearity Layer
|
|
|
|
> [!TIP]
|
|
>
|
|
> This is irrelevant to the lecture, but consider the following term:
|
|
>
|
|
> "Bounded rationality"
|
|
|
|
- Convolution is a linear operation
|
|
- Non-linearity layer creates an activation map from the feature map generated by the convolutional layer
|
|
- Consisting an activation function (an element-wise operation)
|
|
- Rectified linear units (ReLUs) is advantageous over the traditional sigmoid or tanh activation functions
|
|
|
|
### Pooling layer
|
|
|
|
Shrinking the Image Stack
|
|
|
|
- Motivation: the activation maps can be large
|
|
- Reducing the spacial size of the activation maps
|
|
- Often after multiple stages of other layers (i.e., convolutional and non-linear layers)
|
|
- Steps:
|
|
1. Pick a window size (usually 2 or 3).
|
|
2. Pick a stride (usually 2).
|
|
3. Walk your window across your filtered images.
|
|
4. From each window, take the maximum value.
|
|
|
|
Pros:
|
|
|
|
- Reducing the computational requirements
|
|
- Minimizing the likelihood of overfitting
|
|
|
|
Cons:
|
|
|
|
- Aggressive reduction can limit the depth of a network and ultimately limit the performance
|
|
|
|
### Fully connected layer
|
|
|
|
- Multilayer perceptron (MLP)
|
|
- Mapping the activation volume from previous layers into a class probability distribution
|
|
- Non-linearity is built in the neurons, instead of a separate layer
|
|
- Viewed as 1x1 convolution kernels
|
|
|
|
For classification: Output layer is a regular, fully connected layer with softmax non-linearity
|
|
|
|
- Output provides an estimate of the conditional probability of each class
|
|
|
|
> [!TIP]
|
|
>
|
|
> The golden triangle of machine learning:
|
|
>
|
|
> - Data
|
|
> - Algorithm
|
|
> - Computation
|
|
|