Update CSE510_L17.md

2025-10-23 10:58:10 -05:00
parent 824a0bfd7c
commit 5ac36745e2
1 changed files with 53 additions and 0 deletions
--- a/content/CSE510/CSE510_L17.md
+++ b/content/CSE510/CSE510_L17.md
@@ -10,3 +10,56 @@
 - Explainability
 - Super-human performance in practice
 ### Deterministic Environment: Cross-Entropy Method
 #### Stochastic Optimization
 abstract away optimal control/planning:
 $$
 a_1,\ldots, a_T =\argmax_{a_1,\ldots, a_T} J(a_1,\ldots, a_T)
 $$
 $$
 A=\argmax_{A} J(A)
 $$
 Simplest method: guess and check: "random shooting method"
 - pick $A_1, A_2, ..., A_n$ from some distribution (e.g. uniform)
 - Choose $A_i$ based on $\argmax_i J(A_i)$
 #### Cross-Entropy Method with continuous-valued inputs
 1. sample $A_1, A_2, ..., A_n$ from some distribution $p(A)$
 2. evaluate $J(A_1), J(A_2), ..., J(A_n)$
 3. pick the _elites_ $A_1, A_2, ..., A_m$ with the highest $J(A_i)$, where $m<n$
 4. update the distribution $p(A)$ to be more likely to choose the elites
 Pros:
 - Very fast to run if parallelized
 - Extremely simple to implement
 Cons:
 - Very harsh dimensionality limit
 - Only open-loop planning
 - Suboptimal in stochastic environments
 ### Discrete Case: Monte Carlo Tree Search (MCTS)
 Discrete planning as a search problem
 Close-loop planning:
 - At each state, iteratively build a search tree to evaluate actions, select the best-first action, and the move the next state.
 #### Continuous Case: Trajectory Optimization
 #### Linear Quadratic Regulator (LQR)
 #### Non-linear iterative LQR (iLQR)/ Differential Dynamic Programming (DDP)