update notations
This commit is contained in:
@@ -53,7 +53,7 @@ $$
|
||||
Action-Value Actor-Critic
|
||||
|
||||
- Simple actor-critic algorithm based on action-value critic
|
||||
- Using linear value function approximation $Q_w(s,a)=\phi(s,a)^T w$
|
||||
- Using linear value function approximation $Q_w(s,a)=\phi(s,a)^\top w$
|
||||
|
||||
Critic: updates $w$ by linear $TD(0)$
|
||||
Actor: updates $\theta$ by policy gradient
|
||||
|
||||
Reference in New Issue
Block a user