updates?
This commit is contained in:
@@ -56,8 +56,6 @@ Benefits:
|
||||
|
||||
ROMI effectively fills in missing gaps in the state-action graph, improving training stability and performance when paired with conservative offline RL algorithms.
|
||||
|
||||
---
|
||||
|
||||
## Implicit Credit Assignment via Value Factorization Structures
|
||||
|
||||
Although initially studied for multi-agent systems, insights from value factorization also improve offline RL by providing structured credit assignment signals.
|
||||
@@ -84,8 +82,6 @@ In architectures designed for IGM (Individual-Global-Max) consistency, gradients
|
||||
|
||||
Even in single-agent structured RL, similar factorization structures allow credit flowing into components representing skills, modes, or action groups, enabling better temporal and structural decomposition.
|
||||
|
||||
---
|
||||
|
||||
## Model-Based vs Model-Free Offline RL
|
||||
|
||||
Lecture 23 contrasts model-based imagination (ROMI) with conservative model-free methods such as IQL and CQL.
|
||||
@@ -135,8 +131,6 @@ These methods limit exploration into uncertain model regions.
|
||||
- ROMI expands -backward-, staying consistent with known good future states.
|
||||
- ROMI reduces error accumulation because future anchors are real.
|
||||
|
||||
---
|
||||
|
||||
## Combining ROMI With Conservative Offline RL
|
||||
|
||||
ROMI is typically combined with:
|
||||
@@ -157,8 +151,6 @@ Benefits:
|
||||
- Increased policy improvement over dataset.
|
||||
- More stable Q-learning backups.
|
||||
|
||||
---
|
||||
|
||||
## Summary of Lecture 23
|
||||
|
||||
Key points:
|
||||
@@ -168,10 +160,3 @@ Key points:
|
||||
- Reverse imagination avoids pitfalls of forward model error.
|
||||
- Factored value structures provide implicit counterfactual credit assignment.
|
||||
- Combining ROMI with conservative learners yields state-of-the-art performance.
|
||||
|
||||
---
|
||||
|
||||
## Recommended Screenshot Frames for Lecture 23
|
||||
|
||||
- Lecture 23, page 20: ROMI concept diagram depicting reverse imagination from goal states. Subsection: "Reverse Model-Based Imagination (ROMI)".
|
||||
- Lecture 23, page 24: Architecture figure showing reverse policy and reverse dynamics model used to generate imagined transitions. Subsection: "Reverse Imagination Process".
|
||||
|
||||
Reference in New Issue
Block a user