1.2 KiB
1.2 KiB
CSE510 Deep Reinforcement Learning (Lecture 22)
Offline Reinforcement Learning
Requirements for Current Successes
- Access to the Environment Model or Simulator
- Not Costly for Exploration or Trial-and-Error
Background: Offline RL
- The success of modern machine learning
- Scalable data-driven learning methods (GPT-4, CLIP,DALL·E, Sora)
- Reinforcement learning
- Online learning paradigm
- Interaction is expensive & dangerous
- Healthcare, Robotics, Recommendation...
- Can we develop data-driven offline RL?
Definition in Offline RL
-
the policy
\pi_kis updated with a static dataset\mathcal{D}, which is collected by unknown behavior policy\pi_\beta -
Interaction is not allowed
-
\mathcal{D}=\{(s_i,a_i,s_i',r_i)\} -
s\sim d^{\pi_\beta} (s) -
a\sim \pi_\beta (a|s) -
s'\sim p(s'|s,a) -
r\gets r(s,a) -
Objective:
\max_\pi\sum _{t=0}^{T}\mathbb{E}_{s_t\sim d^\pi(s),a_t\sim \pi(a|s)}[\gamma^tr(s_t,a_t)]
Key challenge in Offline RL
Distribution Shift
How about using the traditional reinforcement learning (bootstrapping)?
Q(s,a)=r(s,a)+\gamma \max_{a'\in A} Q(s',a')
\pi(s)=\arg\max_{a\in A} Q(s,a)
but notice that
P_{\pi_beta}(s,a)\neq P_{\pi_f}(s,a)