1.5 KiB
1.5 KiB
CSE510 Deep Reinforcement Learning (Lecture 17)
Model-based RL
Model-based RL vs. Model-free RL
- Sample efficiency
- Generalization and transferability
- Support efficient exploration in large-scale RL problems
- Explainability
- Super-human performance in practice
Deterministic Environment: Cross-Entropy Method
Stochastic Optimization
abstract away optimal control/planning:
a_1,\ldots, a_T =\argmax_{a_1,\ldots, a_T} J(a_1,\ldots, a_T)
A=\argmax_{A} J(A)
Simplest method: guess and check: "random shooting method"
- pick
A_1, A_2, ..., A_nfrom some distribution (e.g. uniform) - Choose
A_ibased on\argmax_i J(A_i)
Cross-Entropy Method with continuous-valued inputs
- sample
A_1, A_2, ..., A_nfrom some distributionp(A) - evaluate
J(A_1), J(A_2), ..., J(A_n) - pick the elites
A_1, A_2, ..., A_mwith the highestJ(A_i), wherem<n - update the distribution
p(A)to be more likely to choose the elites
Pros:
- Very fast to run if parallelized
- Extremely simple to implement
Cons:
- Very harsh dimensionality limit
- Only open-loop planning
- Suboptimal in stochastic environments
Discrete Case: Monte Carlo Tree Search (MCTS)
Discrete planning as a search problem
Close-loop planning:
- At each state, iteratively build a search tree to evaluate actions, select the best-first action, and the move the next state.