breaking update
This commit is contained in:
@@ -1,4 +1,4 @@
|
|||||||
# CSE510 Lecture 5
|
# CSE510 Deep Reinforcement Learning (Lecture 5)
|
||||||
|
|
||||||
## Passive Reinforcement Learning
|
## Passive Reinforcement Learning
|
||||||
|
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
# CSE510 Lecture 6
|
# CSE510 Deep Reinforcement Learning (Lecture 6)
|
||||||
|
|
||||||
## Active reinforcement learning
|
## Active reinforcement learning
|
||||||
|
|
||||||
@@ -242,6 +242,6 @@ From the example we see that it can take many learning trials for the final rewa
|
|||||||
$$
|
$$
|
||||||
5. Goto 2
|
5. Goto 2
|
||||||
|
|
||||||
> [!NOTES]
|
> [!NOTE]
|
||||||
>
|
>
|
||||||
> Compared with Q-learning, SARSA (on-policy) usually takes more "safer" actions.
|
> Compared with Q-learning, SARSA (on-policy) usually takes more "safer" actions.
|
||||||
|
|||||||
89
content/CSE510/CSE510_L7.md
Normal file
89
content/CSE510/CSE510_L7.md
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
# CSE510 Deep Reinforcement Learning (Lecture 7)
|
||||||
|
|
||||||
|
## Large Scale RL
|
||||||
|
|
||||||
|
So far we have represented value functions by a lookup table
|
||||||
|
|
||||||
|
- Every state s has an entry V(s), or
|
||||||
|
- Every state-action pair (s, a) has an entry Q(s, a)
|
||||||
|
|
||||||
|
Reinforcement learning should be used to solve large problems, e.g.
|
||||||
|
|
||||||
|
- Backgammon: 10^20 states
|
||||||
|
- Computer Go: 10^170 states
|
||||||
|
- Helicopter, robot, ...: enormous continuous state space
|
||||||
|
|
||||||
|
Tabular methods clearly cannot handle this.. why?
|
||||||
|
|
||||||
|
- There are too many states and/or actions to store in memory
|
||||||
|
- It is too slow to learn the value of each state individually
|
||||||
|
- You cannot generalize across states!
|
||||||
|
|
||||||
|
### Value Function Approximation (VFA)
|
||||||
|
|
||||||
|
Solution for large MDPs:
|
||||||
|
|
||||||
|
- Estimate the value function using a function approximator
|
||||||
|
|
||||||
|
**Value function approximation (VFA)** replaces the table with general parameterize form:
|
||||||
|
|
||||||
|
$$
|
||||||
|
\hat{V}(s, \theta) \approx V_\pi(s)
|
||||||
|
$$
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
$$
|
||||||
|
\hat{Q}(s, a, \theta) \approx Q_\pi(s, a)
|
||||||
|
$$
|
||||||
|
|
||||||
|
Benefit:
|
||||||
|
|
||||||
|
- Can generalize across states
|
||||||
|
- Save memory (only need to store the function approximator parameters)
|
||||||
|
|
||||||
|
### End-to-End RL
|
||||||
|
|
||||||
|
End-to-end RL methods replace the hand-designed state representation with raw observations.
|
||||||
|
|
||||||
|
- Good: We get rid of manual design of state representations
|
||||||
|
- Bad: we need tons of data to train the network since O_t usually WAY more high dimensional than hand-designed S_t
|
||||||
|
|
||||||
|
## Function Approximation
|
||||||
|
|
||||||
|
- Linear function approximation
|
||||||
|
- Neural network function approximation
|
||||||
|
- Decision tree function approximation
|
||||||
|
- Nearest neighbor
|
||||||
|
- ...
|
||||||
|
|
||||||
|
In this course, we will focus on **Linear combination of features** and **Neural networks**.
|
||||||
|
|
||||||
|
Today we will do Deep neural networks (fully connected and convolutional).
|
||||||
|
|
||||||
|
### Artificial Neural Networks
|
||||||
|
|
||||||
|
#### Neuron
|
||||||
|
|
||||||
|
$f(x) = \mathbb{R}^k\to \mathbb{R}$
|
||||||
|
|
||||||
|
$z=a_1w_1+a_2w_2+\cdots+a_kw_k+b$
|
||||||
|
|
||||||
|
$a_1,a_2,\cdots,a_k$ are the inputs, $w_1,w_2,\cdots,w_k$ are the weights, $b$ is the bias.
|
||||||
|
|
||||||
|
Then we have activation function $\sigma(z)$ (usually non-linear)
|
||||||
|
|
||||||
|
##### Activation functions
|
||||||
|
|
||||||
|
Always positive.
|
||||||
|
|
||||||
|
- ReLU (rectified linear unit):
|
||||||
|
- $$
|
||||||
|
\text{ReLU}(x) = \max(0, x)
|
||||||
|
$$
|
||||||
|
- Sigmoid:
|
||||||
|
- $$
|
||||||
|
\text{Sigmoid}(x) = \frac{1}{1 + e^{-x}}
|
||||||
|
$$
|
||||||
|
|
||||||
|
|
||||||
@@ -9,4 +9,5 @@ export default {
|
|||||||
CSE510_L4: "CSE510 Deep Reinforcement Learning (Lecture 4)",
|
CSE510_L4: "CSE510 Deep Reinforcement Learning (Lecture 4)",
|
||||||
CSE510_L5: "CSE510 Deep Reinforcement Learning (Lecture 5)",
|
CSE510_L5: "CSE510 Deep Reinforcement Learning (Lecture 5)",
|
||||||
CSE510_L6: "CSE510 Deep Reinforcement Learning (Lecture 6)",
|
CSE510_L6: "CSE510 Deep Reinforcement Learning (Lecture 6)",
|
||||||
|
CSE510_L7: "CSE510 Deep Reinforcement Learning (Lecture 7)",
|
||||||
}
|
}
|
||||||
Reference in New Issue
Block a user