update distribute script

This commit is contained in:
Trance-0
2025-11-25 10:19:06 -06:00
parent 3242bfc299
commit 34182ff139
3 changed files with 51 additions and 0 deletions

View File

@@ -0,0 +1,38 @@
# CSE510 Deep Reinforcement Learning (Lecture 26)
## Continue on Real-World Practical Challenges for RL
### Factored multi-agent RL
- Sample efficiency -> Shared Learning
- Complexity -> High-Order Factorization
- Partial Observability -> Communication Learning
- Sparse reward -> Coordinated Exploration
#### Parameter Sharing vs. Diversity
- Parameter Sharing is critical for deep MARL methods
- However, agents tend to acquire homogenous behaviors
- Diversity is essential for exploration and practical tasks
[link to paper: Google Football](https://arxiv.org/pdf/1907.11180)
Schematics of Our Approach: Celebrating Diversity in Shared MARL (CDS)
- In representation, CDS allows MARL to adaptively decide
when to share learning
- Encouraging Diversity in Optimization
In optimization, maximizing an information-theoretic objective to achieve identity-aware diversity
$$
\begin{aligned}
I^\pi(\tau_T;id)&=H(\tau_t)-H(\tau_T|id)=\mathbbb{E}_{id,\tau_T\sim \pi}\left[\log \frac{p(\tau_T|id)}{p(\tau_T)}\right]\\
&= \mathbb{E}_{id,\tau}\left[ \log \frac{p(o_0|id)}{p(o_0)}+\sum_{t=0}^{T-1}\log\frac{a_t|\tau_t,id}{p(a_t|\tau_t)}+\log \frac{p(o_{t+1}|\tau_t,a_t,id)}{p(o_{t+1}|\tau_t,a_t)}\right]
\end{aligned}
$$
Here: $\sum_{t=0}^{T-1}\log\frac{a_t|\tau_t,id}{p(a_t|\tau_t)}$ represents the action diversity.
$\log \frac{p(o_{t+1}|\tau_t,a_t,id)}{p(o_{t+1}|\tau_t,a_t)}$ represents the observation diversity.

View File

@@ -28,4 +28,5 @@ export default {
CSE510_L23: "CSE510 Deep Reinforcement Learning (Lecture 23)",
CSE510_L24: "CSE510 Deep Reinforcement Learning (Lecture 24)",
CSE510_L25: "CSE510 Deep Reinforcement Learning (Lecture 25)",
CSE510_L26: "CSE510 Deep Reinforcement Learning (Lecture 26)",
}