Trance-0/NoteNextra-origin

Files

Zheyuan Wu 5ac36745e2 Update CSE510_L17.md

2025-10-23 10:58:10 -05:00

1.5 KiB

Raw Blame History

CSE510 Deep Reinforcement Learning (Lecture 17)

Model-based RL

Model-based RL vs. Model-free RL

Sample efficiency
Generalization and transferability
Support efficient exploration in large-scale RL problems
Explainability
Super-human performance in practice

Deterministic Environment: Cross-Entropy Method

Stochastic Optimization

abstract away optimal control/planning:


a_1,\ldots, a_T =\argmax_{a_1,\ldots, a_T} J(a_1,\ldots, a_T)


A=\argmax_{A} J(A)

Simplest method: guess and check: "random shooting method"

pick A_1, A_2, ..., A_n from some distribution (e.g. uniform)
Choose A_i based on \argmax_i J(A_i)

Cross-Entropy Method with continuous-valued inputs

sample A_1, A_2, ..., A_n from some distribution p(A)
evaluate J(A_1), J(A_2), ..., J(A_n)
pick the elites A_1, A_2, ..., A_m with the highest J(A_i), where m<n
update the distribution p(A) to be more likely to choose the elites

Pros:

Very fast to run if parallelized
Extremely simple to implement

Cons:

Very harsh dimensionality limit
Only open-loop planning
Suboptimal in stochastic environments

Discrete Case: Monte Carlo Tree Search (MCTS)

Discrete planning as a search problem

Close-loop planning:

At each state, iteratively build a search tree to evaluate actions, select the best-first action, and the move the next state.

Continuous Case: Trajectory Optimization

Linear Quadratic Regulator (LQR)

Non-linear iterative LQR (iLQR)/ Differential Dynamic Programming (DDP)