Update CSE510_L17.md

2025-10-23 10:58:10 -05:00
parent 824a0bfd7c
commit 5ac36745e2
1 changed files with 53 additions and 0 deletions
--- a/content/CSE510/CSE510_L17.md
+++ b/content/CSE510/CSE510_L17.md
@@ -10,3 +10,56 @@
 - Explainability
 - Super-human performance in practice

+### Deterministic Environment: Cross-Entropy Method
+
+#### Stochastic Optimization
+
+abstract away optimal control/planning:
+
+$$
+a_1,\ldots, a_T =\argmax_{a_1,\ldots, a_T} J(a_1,\ldots, a_T)
+$$
+
+$$
+A=\argmax_{A} J(A)
+$$
+
+Simplest method: guess and check: "random shooting method"
+
+- pick $A_1, A_2, ..., A_n$ from some distribution (e.g. uniform)
+- Choose $A_i$ based on $\argmax_i J(A_i)$
+
+#### Cross-Entropy Method with continuous-valued inputs
+
+1. sample $A_1, A_2, ..., A_n$ from some distribution $p(A)$
+2. evaluate $J(A_1), J(A_2), ..., J(A_n)$
+3. pick the _elites_ $A_1, A_2, ..., A_m$ with the highest $J(A_i)$, where $m<n$
+4. update the distribution $p(A)$ to be more likely to choose the elites
+
+Pros:
+
+- Very fast to run if parallelized
+- Extremely simple to implement
+
+Cons:
+
+- Very harsh dimensionality limit
+- Only open-loop planning
+- Suboptimal in stochastic environments
+
+### Discrete Case: Monte Carlo Tree Search (MCTS)
+
+Discrete planning as a search problem
+
+Close-loop planning:
+
+- At each state, iteratively build a search tree to evaluate actions, select the best-first action, and the move the next state.
+
+
+
+#### Continuous Case: Trajectory Optimization
+
+#### Linear Quadratic Regulator (LQR)
+
+#### Non-linear iterative LQR (iLQR)/ Differential Dynamic Programming (DDP)
+