CSE510 Deep Reinforcement Learning (Lecture 7)

Large Scale RL

So far we have represented value functions by a lookup table

Reinforcement learning should be used to solve large problems, e.g.

Tabular methods clearly cannot handle this.. why?

Solution for large MDPs:

Value function approximation (VFA) replaces the table with general parameterize form:


\hat{V}(s, \theta) \approx V_\pi(s)


\hat{Q}(s, a, \theta) \approx Q_\pi(s, a)

Benefit:

End-to-end RL methods replace the hand-designed state representation with raw observations.

Good: We get rid of manual design of state representations
Bad: we need tons of data to train the network since O_t usually WAY more high dimensional than hand-designed S_t

In this course, we will focus on Linear combination of features and Neural networks.

Today we will do Deep neural networks (fully connected and convolutional).

f(x) = \mathbb{R}^k\to \mathbb{R}

z=a_1w_1+a_2w_2+\cdots+a_kw_k+b

a_1,a_2,\cdots,a_k are the inputs, w_1,w_2,\cdots,w_k are the weights, b is the bias.

Then we have activation function \sigma(z) (usually non-linear)

Always positive.

Sigmoid:


\text{Sigmoid}(x) = \frac{1}{1 + e^{-x}}