NoteNextra-origin/content/CSE510/CSE510_L7.md

# CSE510 Deep Reinforcement Learning (Lecture 7)

## Large Scale RL

So far we have represented value functions by a lookup table

- Every state s has an entry V(s), or
- Every state-action pair (s, a) has an entry Q(s, a)

Reinforcement learning should be used to solve large problems, e.g.

- Backgammon: 10^20 states
- Computer Go: 10^170 states
- Helicopter, robot, ...: enormous continuous state space

Tabular methods clearly cannot handle this.. why?

- There are too many states and/or actions to store in memory
- It is too slow to learn the value of each state individually
- You cannot generalize across states!

### Value Function Approximation (VFA)

Solution for large MDPs:

- Estimate the value function using a function approximator

**Value function approximation (VFA)** replaces the table with general parameterize form:

$$
\hat{V}(s, \theta) \approx V_\pi(s)
$$

or

$$
\hat{Q}(s, a, \theta) \approx Q_\pi(s, a)
$$

Benefit:

- Can generalize across states
- Save memory (only need to store the function approximator parameters)

### End-to-End RL

End-to-end RL methods replace the hand-designed state representation with raw observations.

- Good: We get rid of manual design of state representations
- Bad: we need tons of data to train the network since O_t usually WAY more high dimensional than hand-designed S_t

## Function Approximation

- Linear function approximation
- Neural network function approximation
- Decision tree function approximation
- Nearest neighbor
- ...

In this course, we will focus on **Linear combination of features** and **Neural networks**.

Today we will do Deep neural networks (fully connected and convolutional).

### Artificial Neural Networks

#### Neuron

$f(x) = \mathbb{R}^k\to \mathbb{R}$

$z=a_1w_1+a_2w_2+\cdots+a_kw_k+b$

$a_1,a_2,\cdots,a_k$ are the inputs, $w_1,w_2,\cdots,w_k$ are the weights, $b$ is the bias.

Then we have activation function $\sigma(z)$ (usually non-linear)

##### Activation functions

Always positive.

- ReLU (rectified linear unit):
  - $$
    \text{ReLU}(x) = \max(0, x)
    $$
- Sigmoid:
  - $$
    \text{Sigmoid}(x) = \frac{1}{1 + e^{-x}}
    $$