This commit is contained in:
Trance-0
2025-11-18 14:08:20 -06:00
parent 9416bd4956
commit 2946feefbe
4 changed files with 18 additions and 72 deletions

View File

@@ -56,8 +56,6 @@ Benefits:
ROMI effectively fills in missing gaps in the state-action graph, improving training stability and performance when paired with conservative offline RL algorithms.
---
## Implicit Credit Assignment via Value Factorization Structures
Although initially studied for multi-agent systems, insights from value factorization also improve offline RL by providing structured credit assignment signals.
@@ -84,8 +82,6 @@ In architectures designed for IGM (Individual-Global-Max) consistency, gradients
Even in single-agent structured RL, similar factorization structures allow credit flowing into components representing skills, modes, or action groups, enabling better temporal and structural decomposition.
---
## Model-Based vs Model-Free Offline RL
Lecture 23 contrasts model-based imagination (ROMI) with conservative model-free methods such as IQL and CQL.
@@ -135,8 +131,6 @@ These methods limit exploration into uncertain model regions.
- ROMI expands -backward-, staying consistent with known good future states.
- ROMI reduces error accumulation because future anchors are real.
---
## Combining ROMI With Conservative Offline RL
ROMI is typically combined with:
@@ -157,8 +151,6 @@ Benefits:
- Increased policy improvement over dataset.
- More stable Q-learning backups.
---
## Summary of Lecture 23
Key points:
@@ -168,10 +160,3 @@ Key points:
- Reverse imagination avoids pitfalls of forward model error.
- Factored value structures provide implicit counterfactual credit assignment.
- Combining ROMI with conservative learners yields state-of-the-art performance.
---
## Recommended Screenshot Frames for Lecture 23
- Lecture 23, page 20: ROMI concept diagram depicting reverse imagination from goal states. Subsection: "Reverse Model-Based Imagination (ROMI)".
- Lecture 23, page 24: Architecture figure showing reverse policy and reverse dynamics model used to generate imagined transitions. Subsection: "Reverse Imagination Process".