updates?

2025-11-18 14:08:20 -06:00
parent 9416bd4956
commit 2946feefbe
4 changed files with 18 additions and 72 deletions
--- a/content/CSE510/CSE510_L23.md
+++ b/content/CSE510/CSE510_L23.md
@@ -56,8 +56,6 @@ Benefits:

 ROMI effectively fills in missing gaps in the state-action graph, improving training stability and performance when paired with conservative offline RL algorithms.

---
-
 ## Implicit Credit Assignment via Value Factorization Structures

 Although initially studied for multi-agent systems, insights from value factorization also improve offline RL by providing structured credit assignment signals.
@@ -84,8 +82,6 @@ In architectures designed for IGM (Individual-Global-Max) consistency, gradients

 Even in single-agent structured RL, similar factorization structures allow credit flowing into components representing skills, modes, or action groups, enabling better temporal and structural decomposition.

---
-
 ## Model-Based vs Model-Free Offline RL

 Lecture 23 contrasts model-based imagination (ROMI) with conservative model-free methods such as IQL and CQL.
@@ -135,8 +131,6 @@ These methods limit exploration into uncertain model regions.
 - ROMI expands -backward-, staying consistent with known good future states.
 - ROMI reduces error accumulation because future anchors are real.

---
-
 ## Combining ROMI With Conservative Offline RL

 ROMI is typically combined with:
@@ -157,8 +151,6 @@ Benefits:
 - Increased policy improvement over dataset.
 - More stable Q-learning backups.

---
-
 ## Summary of Lecture 23

 Key points:
@@ -168,10 +160,3 @@ Key points:
 - Reverse imagination avoids pitfalls of forward model error.
 - Factored value structures provide implicit counterfactual credit assignment.
 - Combining ROMI with conservative learners yields state-of-the-art performance.
-
---
-
-## Recommended Screenshot Frames for Lecture 23
-
- Lecture 23, page 20: ROMI concept diagram depicting reverse imagination from goal states. Subsection: "Reverse Model-Based Imagination (ROMI)".
- Lecture 23, page 24: Architecture figure showing reverse policy and reverse dynamics model used to generate imagined transitions. Subsection: "Reverse Imagination Process".