2.1 KiB
CSE510 Deep Reinforcement Learning (Lecture 7)
Large Scale RL
So far we have represented value functions by a lookup table
- Every state s has an entry V(s), or
- Every state-action pair (s, a) has an entry Q(s, a)
Reinforcement learning should be used to solve large problems, e.g.
- Backgammon: 10^20 states
- Computer Go: 10^170 states
- Helicopter, robot, ...: enormous continuous state space
Tabular methods clearly cannot handle this.. why?
- There are too many states and/or actions to store in memory
- It is too slow to learn the value of each state individually
- You cannot generalize across states!
Value Function Approximation (VFA)
Solution for large MDPs:
- Estimate the value function using a function approximator
Value function approximation (VFA) replaces the table with general parameterize form:
\hat{V}(s, \theta) \approx V_\pi(s)
or
\hat{Q}(s, a, \theta) \approx Q_\pi(s, a)
Benefit:
- Can generalize across states
- Save memory (only need to store the function approximator parameters)
End-to-End RL
End-to-end RL methods replace the hand-designed state representation with raw observations.
- Good: We get rid of manual design of state representations
- Bad: we need tons of data to train the network since O_t usually WAY more high dimensional than hand-designed S_t
Function Approximation
- Linear function approximation
- Neural network function approximation
- Decision tree function approximation
- Nearest neighbor
- ...
In this course, we will focus on Linear combination of features and Neural networks.
Today we will do Deep neural networks (fully connected and convolutional).
Artificial Neural Networks
Neuron
f(x) = \mathbb{R}^k\to \mathbb{R}
z=a_1w_1+a_2w_2+\cdots+a_kw_k+b
a_1,a_2,\cdots,a_k are the inputs, w_1,w_2,\cdots,w_k are the weights, b is the bias.
Then we have activation function \sigma(z) (usually non-linear)
Activation functions
Always positive.
- ReLU (rectified linear unit):
-
\text{ReLU}(x) = \max(0, x)
-
- Sigmoid:
-
\text{Sigmoid}(x) = \frac{1}{1 + e^{-x}}
-