This commit is contained in:
Zheyuan Wu
2025-10-21 12:50:11 -05:00
parent 9f7d99b745
commit b845aca63c
2 changed files with 354 additions and 1 deletions

View File

@@ -148,3 +148,30 @@ $$
y_1 = r + \gamma \min_{i=1,2} Q^{\theta_i'}(s', \pi_{\phi_i}(s'))
$$
High-variance estimates provide a noisy gradient.
Techniques in TD3 to reduce the variance:
- Update the policy at a lower frequency than the value network.
- Smoothing the value estimate:
$$
y=r+\gamma \mathbb{E}_{\epsilon}[Q^{\theta'}(s', \pi_{\phi'}(s')+\epsilon)]
$$
Update target:
$$
y=r+\gamma \mathbb{E}_{\epsilon}[Q^{\theta'}(s', \pi_{\phi'}(s')+\epsilon)]
$$
where $\epsilon\sim clip(\mathcal{N}(0, \sigma), -c, c)$
#### Other methods
- Generalizable Episode Memory for Deep Reinforcement Learning
- Distributed Distributional Deep Deterministic Policy Gradient
- Distributional critic
- N-step returns are used to update the critic
- Multiple distributed parallel actors
- Prioritized experience replay
-