Files
NoteNextra-origin/content/CSE510/CSE510_L17.md
2025-10-23 10:58:10 -05:00

1.5 KiB

CSE510 Deep Reinforcement Learning (Lecture 17)

Model-based RL

Model-based RL vs. Model-free RL

  • Sample efficiency
  • Generalization and transferability
  • Support efficient exploration in large-scale RL problems
  • Explainability
  • Super-human performance in practice

Deterministic Environment: Cross-Entropy Method

Stochastic Optimization

abstract away optimal control/planning:


a_1,\ldots, a_T =\argmax_{a_1,\ldots, a_T} J(a_1,\ldots, a_T)

A=\argmax_{A} J(A)

Simplest method: guess and check: "random shooting method"

  • pick A_1, A_2, ..., A_n from some distribution (e.g. uniform)
  • Choose A_i based on \argmax_i J(A_i)

Cross-Entropy Method with continuous-valued inputs

  1. sample A_1, A_2, ..., A_n from some distribution p(A)
  2. evaluate J(A_1), J(A_2), ..., J(A_n)
  3. pick the elites A_1, A_2, ..., A_m with the highest J(A_i), where m<n
  4. update the distribution p(A) to be more likely to choose the elites

Pros:

  • Very fast to run if parallelized
  • Extremely simple to implement

Cons:

  • Very harsh dimensionality limit
  • Only open-loop planning
  • Suboptimal in stochastic environments

Discrete Case: Monte Carlo Tree Search (MCTS)

Discrete planning as a search problem

Close-loop planning:

  • At each state, iteratively build a search tree to evaluate actions, select the best-first action, and the move the next state.

Continuous Case: Trajectory Optimization

Linear Quadratic Regulator (LQR)

Non-linear iterative LQR (iLQR)/ Differential Dynamic Programming (DDP)