# CSE510 Deep Reinforcement Learning (Lecture 7) ## Large Scale RL So far we have represented value functions by a lookup table - Every state s has an entry V(s), or - Every state-action pair (s, a) has an entry Q(s, a) Reinforcement learning should be used to solve large problems, e.g. - Backgammon: 10^20 states - Computer Go: 10^170 states - Helicopter, robot, ...: enormous continuous state space Tabular methods clearly cannot handle this.. why? - There are too many states and/or actions to store in memory - It is too slow to learn the value of each state individually - You cannot generalize across states! ### Value Function Approximation (VFA) Solution for large MDPs: - Estimate the value function using a function approximator **Value function approximation (VFA)** replaces the table with general parameterize form: $$ \hat{V}(s, \theta) \approx V_\pi(s) $$ or $$ \hat{Q}(s, a, \theta) \approx Q_\pi(s, a) $$ Benefit: - Can generalize across states - Save memory (only need to store the function approximator parameters) ### End-to-End RL End-to-end RL methods replace the hand-designed state representation with raw observations. - Good: We get rid of manual design of state representations - Bad: we need tons of data to train the network since O_t usually WAY more high dimensional than hand-designed S_t ## Function Approximation - Linear function approximation - Neural network function approximation - Decision tree function approximation - Nearest neighbor - ... In this course, we will focus on **Linear combination of features** and **Neural networks**. Today we will do Deep neural networks (fully connected and convolutional). ### Artificial Neural Networks #### Neuron $f(x) = \mathbb{R}^k\to \mathbb{R}$ $z=a_1w_1+a_2w_2+\cdots+a_kw_k+b$ $a_1,a_2,\cdots,a_k$ are the inputs, $w_1,w_2,\cdots,w_k$ are the weights, $b$ is the bias. Then we have activation function $\sigma(z)$ (usually non-linear) ##### Activation functions Always positive. - ReLU (rectified linear unit): - $$ \text{ReLU}(x) = \max(0, x) $$ - Sigmoid: - $$ \text{Sigmoid}(x) = \frac{1}{1 + e^{-x}} $$