update notations

This commit is contained in:
Trance-0
2025-11-04 12:43:23 -06:00
parent d24c0bdd9e
commit 614479e4d0
27 changed files with 333 additions and 100 deletions

View File

@@ -53,7 +53,7 @@ $$
Action-Value Actor-Critic
- Simple actor-critic algorithm based on action-value critic
- Using linear value function approximation $Q_w(s,a)=\phi(s,a)^T w$
- Using linear value function approximation $Q_w(s,a)=\phi(s,a)^\top w$
Critic: updates $w$ by linear $TD(0)$
Actor: updates $\theta$ by policy gradient