updates

2025-11-11 12:45:58 -06:00
parent 51c9f091d6
commit 34afc00a7f
5 changed files with 387 additions and 0 deletions
--- a/content/CSE510/CSE510_L22.md
+++ b/content/CSE510/CSE510_L22.md
@@ -0,0 +1,50 @@
+# CSE510 Deep Reinforcement Learning (Lecture 22)
+
+## Offline Reinforcement Learning
+
+### Requirements for Current Successes
+
+- Access to the Environment Model or Simulator
+- Not Costly for Exploration or Trial-and-Error
+
+#### Background: Offline RL
+
+- The success of modern machine learning
+  - Scalable data-driven learning methods (GPT-4, CLIP,DALL·E, Sora)
+- Reinforcement learning
+  - Online learning paradigm
+  - Interaction is expensive & dangerous
+    - Healthcare, Robotics, Recommendation...
+- Can we develop data-driven offline RL?
+
+#### Definition in Offline RL
+
+- the policy $\pi_k$ is updated with a static dataset $\mathcal{D}$, which is collected by _unknown behavior policy_ $\pi_\beta$
+- Interaction is not allowed
+
+- $\mathcal{D}=\{(s_i,a_i,s_i',r_i)\}$
+- $s\sim d^{\pi_\beta} (s)$
+- $a\sim \pi_\beta (a|s)$
+- $s'\sim p(s'|s,a)$
+- $r\gets r(s,a)$
+- Objective: $\max_\pi\sum _{t=0}^{T}\mathbb{E}_{s_t\sim d^\pi(s),a_t\sim \pi(a|s)}[\gamma^tr(s_t,a_t)]$
+
+#### Key challenge in Offline RL
+
+Distribution Shift
+
+How about using the traditional reinforcement learning (bootstrapping)?
+
+$$
+Q(s,a)=r(s,a)+\gamma \max_{a'\in A} Q(s',a')
+$$
+
+$$
+\pi(s)=\arg\max_{a\in A} Q(s,a)
+$$
+
+but notice that
+
+$$
+P_{\pi_beta}(s,a)\neq P_{\pi_f}(s,a)
+$$
--- a/content/CSE510/_meta.js
+++ b/content/CSE510/_meta.js
@@ -24,4 +24,5 @@ export default {
    CSE510_L19: "CSE510 Deep Reinforcement Learning (Lecture 19)",
    CSE510_L20: "CSE510 Deep Reinforcement Learning (Lecture 20)",
    CSE510_L21: "CSE510 Deep Reinforcement Learning (Lecture 21)",
+    CSE510_L22: "CSE510 Deep Reinforcement Learning (Lecture 22)",
 }