updates?

2025-11-18 14:08:20 -06:00
parent 9416bd4956
commit 2946feefbe
4 changed files with 18 additions and 72 deletions
--- a/content/CSE510/CSE510_L22.md
+++ b/content/CSE510/CSE510_L22.md
@@ -105,9 +105,7 @@ There are two primary families of solutions:
 1. --Policy constraint methods--
 2. --Conservative value estimation methods--

---
-
-# 1. Policy Constraint Methods
+## 1. Policy Constraint Methods

 These methods restrict the learned policy to stay close to the behavior policy so it does not take unsupported actions.

@@ -163,9 +161,7 @@ Parameter explanations:

 BEAR controls distribution shift more tightly than BCQ.

---
-
-# 2. Conservative Value Function Methods
+## 2. Conservative Value Function Methods

 These methods modify Q-learning so Q-values of unseen actions are -underestimated-, preventing the policy from exploiting overestimated values.

@@ -213,9 +209,7 @@ Key idea:

 IQL often achieves state-of-the-art performance due to simplicity and stability.

---
-
-# Model-Based Offline RL
+## Model-Based Offline RL

 ### Forward Model-Based RL

@@ -248,9 +242,7 @@ Parameter explanations:

 These methods limit exploration into unknown model regions.

---
-
-# Reverse Model-Based Imagination (ROMI)
+## Reverse Model-Based Imagination (ROMI)

 ROMI generates new training data by -backward- imagination.

@@ -288,8 +280,6 @@ Benefits:

 ROMI combined with conservative RL often outperforms standard offline methods.

---
-
 # Summary of Lecture 22

 Offline RL requires balancing:
@@ -304,14 +294,3 @@ Three major families of solutions:
 3. Model-based conservatism and imagination (MOPO, MOReL, ROMI)

 Offline RL is becoming practical for real-world domains such as healthcare, robotics, autonomous driving, and recommender systems.
-
---
-
-# Recommended Screenshot Frames for Lecture 22
-
- Lecture 22, page 7: Offline RL diagram showing policy learning from fixed dataset, subsection "Offline RL Setting".
- Lecture 22, page 35: Illustration of dataset support vs policy action distribution, subsection "Strategies for Safe Offline RL".
-
---
-
--End of CSE510_L22.md--