Update CSE510_L17.md
This commit is contained in:
@@ -10,3 +10,56 @@
|
|||||||
- Explainability
|
- Explainability
|
||||||
- Super-human performance in practice
|
- Super-human performance in practice
|
||||||
|
|
||||||
|
### Deterministic Environment: Cross-Entropy Method
|
||||||
|
|
||||||
|
#### Stochastic Optimization
|
||||||
|
|
||||||
|
abstract away optimal control/planning:
|
||||||
|
|
||||||
|
$$
|
||||||
|
a_1,\ldots, a_T =\argmax_{a_1,\ldots, a_T} J(a_1,\ldots, a_T)
|
||||||
|
$$
|
||||||
|
|
||||||
|
$$
|
||||||
|
A=\argmax_{A} J(A)
|
||||||
|
$$
|
||||||
|
|
||||||
|
Simplest method: guess and check: "random shooting method"
|
||||||
|
|
||||||
|
- pick $A_1, A_2, ..., A_n$ from some distribution (e.g. uniform)
|
||||||
|
- Choose $A_i$ based on $\argmax_i J(A_i)$
|
||||||
|
|
||||||
|
#### Cross-Entropy Method with continuous-valued inputs
|
||||||
|
|
||||||
|
1. sample $A_1, A_2, ..., A_n$ from some distribution $p(A)$
|
||||||
|
2. evaluate $J(A_1), J(A_2), ..., J(A_n)$
|
||||||
|
3. pick the _elites_ $A_1, A_2, ..., A_m$ with the highest $J(A_i)$, where $m<n$
|
||||||
|
4. update the distribution $p(A)$ to be more likely to choose the elites
|
||||||
|
|
||||||
|
Pros:
|
||||||
|
|
||||||
|
- Very fast to run if parallelized
|
||||||
|
- Extremely simple to implement
|
||||||
|
|
||||||
|
Cons:
|
||||||
|
|
||||||
|
- Very harsh dimensionality limit
|
||||||
|
- Only open-loop planning
|
||||||
|
- Suboptimal in stochastic environments
|
||||||
|
|
||||||
|
### Discrete Case: Monte Carlo Tree Search (MCTS)
|
||||||
|
|
||||||
|
Discrete planning as a search problem
|
||||||
|
|
||||||
|
Close-loop planning:
|
||||||
|
|
||||||
|
- At each state, iteratively build a search tree to evaluate actions, select the best-first action, and the move the next state.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### Continuous Case: Trajectory Optimization
|
||||||
|
|
||||||
|
#### Linear Quadratic Regulator (LQR)
|
||||||
|
|
||||||
|
#### Non-linear iterative LQR (iLQR)/ Differential Dynamic Programming (DDP)
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user