breaking update
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||
# CSE510 Lecture 6
|
||||
# CSE510 Deep Reinforcement Learning (Lecture 6)
|
||||
|
||||
## Active reinforcement learning
|
||||
|
||||
@@ -242,6 +242,6 @@ From the example we see that it can take many learning trials for the final rewa
|
||||
$$
|
||||
5. Goto 2
|
||||
|
||||
> [!NOTES]
|
||||
> [!NOTE]
|
||||
>
|
||||
> Compared with Q-learning, SARSA (on-policy) usually takes more "safer" actions.
|
||||
|
||||
Reference in New Issue
Block a user