Files
NoteNextra-origin/content/CSE510/CSE510_L7.md
2025-09-16 10:29:48 -05:00

2.1 KiB

CSE510 Deep Reinforcement Learning (Lecture 7)

Large Scale RL

So far we have represented value functions by a lookup table

  • Every state s has an entry V(s), or
  • Every state-action pair (s, a) has an entry Q(s, a)

Reinforcement learning should be used to solve large problems, e.g.

  • Backgammon: 10^20 states
  • Computer Go: 10^170 states
  • Helicopter, robot, ...: enormous continuous state space

Tabular methods clearly cannot handle this.. why?

  • There are too many states and/or actions to store in memory
  • It is too slow to learn the value of each state individually
  • You cannot generalize across states!

Value Function Approximation (VFA)

Solution for large MDPs:

  • Estimate the value function using a function approximator

Value function approximation (VFA) replaces the table with general parameterize form:


\hat{V}(s, \theta) \approx V_\pi(s)

or


\hat{Q}(s, a, \theta) \approx Q_\pi(s, a)

Benefit:

  • Can generalize across states
  • Save memory (only need to store the function approximator parameters)

End-to-End RL

End-to-end RL methods replace the hand-designed state representation with raw observations.

  • Good: We get rid of manual design of state representations
  • Bad: we need tons of data to train the network since O_t usually WAY more high dimensional than hand-designed S_t

Function Approximation

  • Linear function approximation
  • Neural network function approximation
  • Decision tree function approximation
  • Nearest neighbor
  • ...

In this course, we will focus on Linear combination of features and Neural networks.

Today we will do Deep neural networks (fully connected and convolutional).

Artificial Neural Networks

Neuron

f(x) = \mathbb{R}^k\to \mathbb{R}

z=a_1w_1+a_2w_2+\cdots+a_kw_k+b

a_1,a_2,\cdots,a_k are the inputs, w_1,w_2,\cdots,w_k are the weights, b is the bias.

Then we have activation function \sigma(z) (usually non-linear)

Activation functions

Always positive.

  • ReLU (rectified linear unit):
    • 
      \text{ReLU}(x) = \max(0, x)
      
  • Sigmoid:
    • 
      \text{Sigmoid}(x) = \frac{1}{1 + e^{-x}}