update notations

2025-11-04 12:43:23 -06:00
parent d24c0bdd9e
commit 614479e4d0
27 changed files with 333 additions and 100 deletions
--- a/content/CSE510/CSE510_L12.md
+++ b/content/CSE510/CSE510_L12.md
@@ -53,7 +53,7 @@ $$
 Action-Value Actor-Critic

 - Simple actor-critic algorithm based on action-value critic
- Using linear value function approximation $Q_w(s,a)=\phi(s,a)^T w$
+- Using linear value function approximation $Q_w(s,a)=\phi(s,a)^\top w$

 Critic: updates $w$ by linear $TD(0)$
 Actor: updates $\theta$ by policy gradient