breaking update

2025-09-16 10:29:48 -05:00
parent 03baf25685
commit 5e7b8a141d
4 changed files with 93 additions and 3 deletions
--- a/content/CSE510/CSE510_L5.md
+++ b/content/CSE510/CSE510_L5.md
@@ -1,4 +1,4 @@
-# CSE510 Lecture 5
+# CSE510 Deep Reinforcement Learning (Lecture 5)

 ## Passive Reinforcement Learning

--- a/content/CSE510/CSE510_L6.md
+++ b/content/CSE510/CSE510_L6.md
@@ -1,4 +1,4 @@
-# CSE510 Lecture 6
+# CSE510 Deep Reinforcement Learning (Lecture 6)

 ## Active reinforcement learning

@@ -242,6 +242,6 @@ From the example we see that it can take many learning trials for the final rewa
   $$
 5. Goto 2

-> [!NOTES]
+> [!NOTE]
 >
 > Compared with Q-learning, SARSA (on-policy) usually takes more "safer" actions.
--- a/content/CSE510/CSE510_L7.md
+++ b/content/CSE510/CSE510_L7.md
@@ -0,0 +1,89 @@
+# CSE510 Deep Reinforcement Learning (Lecture 7)
+
+## Large Scale RL
+
+So far we have represented value functions by a lookup table
+
+- Every state s has an entry V(s), or
+- Every state-action pair (s, a) has an entry Q(s, a)
+
+Reinforcement learning should be used to solve large problems, e.g.
+
+- Backgammon: 10^20 states
+- Computer Go: 10^170 states
+- Helicopter, robot, ...: enormous continuous state space
+
+Tabular methods clearly cannot handle this.. why?
+
+- There are too many states and/or actions to store in memory
+- It is too slow to learn the value of each state individually
+- You cannot generalize across states!
+
+### Value Function Approximation (VFA)
+
+Solution for large MDPs:
+
+- Estimate the value function using a function approximator
+
+**Value function approximation (VFA)** replaces the table with general parameterize form:
+
+$$
+\hat{V}(s, \theta) \approx V_\pi(s)
+$$
+
+or
+
+$$
+\hat{Q}(s, a, \theta) \approx Q_\pi(s, a)
+$$
+
+Benefit:
+
+- Can generalize across states
+- Save memory (only need to store the function approximator parameters)
+
+### End-to-End RL
+
+End-to-end RL methods replace the hand-designed state representation with raw observations.
+
+- Good: We get rid of manual design of state representations
+- Bad: we need tons of data to train the network since O_t usually WAY more high dimensional than hand-designed S_t
+
+## Function Approximation
+
+- Linear function approximation
+- Neural network function approximation
+- Decision tree function approximation
+- Nearest neighbor 
+- ...
+
+In this course, we will focus on **Linear combination of features** and **Neural networks**.
+
+Today we will do Deep neural networks (fully connected and convolutional).
+
+### Artificial Neural Networks
+
+#### Neuron
+
+$f(x) = \mathbb{R}^k\to \mathbb{R}$
+
+$z=a_1w_1+a_2w_2+\cdots+a_kw_k+b$
+
+$a_1,a_2,\cdots,a_k$ are the inputs, $w_1,w_2,\cdots,w_k$ are the weights, $b$ is the bias.
+
+Then we have activation function $\sigma(z)$ (usually non-linear)
+
+##### Activation functions
+
+Always positive.
+
+- ReLU (rectified linear unit):
+  - $$
+    \text{ReLU}(x) = \max(0, x)
+    $$
+- Sigmoid:
+  - $$
+    \text{Sigmoid}(x) = \frac{1}{1 + e^{-x}}
+    $$
+
+
--- a/content/CSE510/_meta.js
+++ b/content/CSE510/_meta.js
@@ -9,4 +9,5 @@ export default {
    CSE510_L4: "CSE510 Deep Reinforcement Learning (Lecture 4)",
    CSE510_L5: "CSE510 Deep Reinforcement Learning (Lecture 5)",
    CSE510_L6: "CSE510 Deep Reinforcement Learning (Lecture 6)",
+    CSE510_L7: "CSE510 Deep Reinforcement Learning (Lecture 7)",
 }