# CSE510 Deep Reinforcement Learning (Lecture 17) ## Model-based RL ### Model-based RL vs. Model-free RL - Sample efficiency - Generalization and transferability - Support efficient exploration in large-scale RL problems - Explainability - Super-human performance in practice ### Deterministic Environment: Cross-Entropy Method #### Stochastic Optimization abstract away optimal control/planning: $$ a_1,\ldots, a_T =\argmax_{a_1,\ldots, a_T} J(a_1,\ldots, a_T) $$ $$ A=\argmax_{A} J(A) $$ Simplest method: guess and check: "random shooting method" - pick $A_1, A_2, ..., A_n$ from some distribution (e.g. uniform) - Choose $A_i$ based on $\argmax_i J(A_i)$ #### Cross-Entropy Method with continuous-valued inputs 1. sample $A_1, A_2, ..., A_n$ from some distribution $p(A)$ 2. evaluate $J(A_1), J(A_2), ..., J(A_n)$ 3. pick the _elites_ $A_1, A_2, ..., A_m$ with the highest $J(A_i)$, where $m