From 5ac36745e265fd46837f59bf4a711a47d5e5e0e5 Mon Sep 17 00:00:00 2001
From: Zheyuan Wu <60459821+Trance-0@users.noreply.github.com>
Date: Thu, 23 Oct 2025 10:58:10 -0500
Subject: [PATCH] Update CSE510_L17.md

---
 content/CSE510/CSE510_L17.md | 53 ++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/content/CSE510/CSE510_L17.md b/content/CSE510/CSE510_L17.md
index 86cf02b..332c0c2 100644
--- a/content/CSE510/CSE510_L17.md
+++ b/content/CSE510/CSE510_L17.md
@@ -10,3 +10,56 @@
 - Explainability
 - Super-human performance in practice
 
+### Deterministic Environment: Cross-Entropy Method
+
+#### Stochastic Optimization
+
+abstract away optimal control/planning:
+
+$$
+a_1,\ldots, a_T =\argmax_{a_1,\ldots, a_T} J(a_1,\ldots, a_T)
+$$
+
+$$
+A=\argmax_{A} J(A)
+$$
+
+Simplest method: guess and check: "random shooting method"
+
+- pick $A_1, A_2, ..., A_n$ from some distribution (e.g. uniform)
+- Choose $A_i$ based on $\argmax_i J(A_i)$
+
+#### Cross-Entropy Method with continuous-valued inputs
+
+1. sample $A_1, A_2, ..., A_n$ from some distribution $p(A)$
+2. evaluate $J(A_1), J(A_2), ..., J(A_n)$
+3. pick the _elites_ $A_1, A_2, ..., A_m$ with the highest $J(A_i)$, where $m<n$
+4. update the distribution $p(A)$ to be more likely to choose the elites
+
+Pros:
+
+- Very fast to run if parallelized
+- Extremely simple to implement
+
+Cons:
+
+- Very harsh dimensionality limit
+- Only open-loop planning
+- Suboptimal in stochastic environments
+
+### Discrete Case: Monte Carlo Tree Search (MCTS)
+
+Discrete planning as a search problem
+
+Close-loop planning:
+
+- At each state, iteratively build a search tree to evaluate actions, select the best-first action, and the move the next state.
+
+
+
+#### Continuous Case: Trajectory Optimization
+
+#### Linear Quadratic Regulator (LQR)
+
+#### Non-linear iterative LQR (iLQR)/ Differential Dynamic Programming (DDP)
+