diff --git a/content/CSE510/CSE510_L5.md b/content/CSE510/CSE510_L5.md index 5b8b858..d651270 100644 --- a/content/CSE510/CSE510_L5.md +++ b/content/CSE510/CSE510_L5.md @@ -1,4 +1,4 @@ -# CSE510 Lecture 5 +# CSE510 Deep Reinforcement Learning (Lecture 5) ## Passive Reinforcement Learning diff --git a/content/CSE510/CSE510_L6.md b/content/CSE510/CSE510_L6.md index 2e9a0cf..13cb269 100644 --- a/content/CSE510/CSE510_L6.md +++ b/content/CSE510/CSE510_L6.md @@ -1,4 +1,4 @@ -# CSE510 Lecture 6 +# CSE510 Deep Reinforcement Learning (Lecture 6) ## Active reinforcement learning @@ -242,6 +242,6 @@ From the example we see that it can take many learning trials for the final rewa $$ 5. Goto 2 -> [!NOTES] +> [!NOTE] > > Compared with Q-learning, SARSA (on-policy) usually takes more "safer" actions. diff --git a/content/CSE510/CSE510_L7.md b/content/CSE510/CSE510_L7.md new file mode 100644 index 0000000..5a1cde7 --- /dev/null +++ b/content/CSE510/CSE510_L7.md @@ -0,0 +1,89 @@ +# CSE510 Deep Reinforcement Learning (Lecture 7) + +## Large Scale RL + +So far we have represented value functions by a lookup table + +- Every state s has an entry V(s), or +- Every state-action pair (s, a) has an entry Q(s, a) + +Reinforcement learning should be used to solve large problems, e.g. + +- Backgammon: 10^20 states +- Computer Go: 10^170 states +- Helicopter, robot, ...: enormous continuous state space + +Tabular methods clearly cannot handle this.. why? + +- There are too many states and/or actions to store in memory +- It is too slow to learn the value of each state individually +- You cannot generalize across states! + +### Value Function Approximation (VFA) + +Solution for large MDPs: + +- Estimate the value function using a function approximator + +**Value function approximation (VFA)** replaces the table with general parameterize form: + +$$ +\hat{V}(s, \theta) \approx V_\pi(s) +$$ + +or + +$$ +\hat{Q}(s, a, \theta) \approx Q_\pi(s, a) +$$ + +Benefit: + +- Can generalize across states +- Save memory (only need to store the function approximator parameters) + +### End-to-End RL + +End-to-end RL methods replace the hand-designed state representation with raw observations. + +- Good: We get rid of manual design of state representations +- Bad: we need tons of data to train the network since O_t usually WAY more high dimensional than hand-designed S_t + +## Function Approximation + +- Linear function approximation +- Neural network function approximation +- Decision tree function approximation +- Nearest neighbor +- ... + +In this course, we will focus on **Linear combination of features** and **Neural networks**. + +Today we will do Deep neural networks (fully connected and convolutional). + +### Artificial Neural Networks + +#### Neuron + +$f(x) = \mathbb{R}^k\to \mathbb{R}$ + +$z=a_1w_1+a_2w_2+\cdots+a_kw_k+b$ + +$a_1,a_2,\cdots,a_k$ are the inputs, $w_1,w_2,\cdots,w_k$ are the weights, $b$ is the bias. + +Then we have activation function $\sigma(z)$ (usually non-linear) + +##### Activation functions + +Always positive. + +- ReLU (rectified linear unit): + - $$ + \text{ReLU}(x) = \max(0, x) + $$ +- Sigmoid: + - $$ + \text{Sigmoid}(x) = \frac{1}{1 + e^{-x}} + $$ + + diff --git a/content/CSE510/_meta.js b/content/CSE510/_meta.js index c24e567..fe7a1e0 100644 --- a/content/CSE510/_meta.js +++ b/content/CSE510/_meta.js @@ -9,4 +9,5 @@ export default { CSE510_L4: "CSE510 Deep Reinforcement Learning (Lecture 4)", CSE510_L5: "CSE510 Deep Reinforcement Learning (Lecture 5)", CSE510_L6: "CSE510 Deep Reinforcement Learning (Lecture 6)", + CSE510_L7: "CSE510 Deep Reinforcement Learning (Lecture 7)", } \ No newline at end of file