Files
NoteNextra-origin/content/CSE510/CSE510_L26.md
2025-11-25 10:19:06 -06:00

1.4 KiB

CSE510 Deep Reinforcement Learning (Lecture 26)

Continue on Real-World Practical Challenges for RL

Factored multi-agent RL

  • Sample efficiency -> Shared Learning
  • Complexity -> High-Order Factorization
  • Partial Observability -> Communication Learning
  • Sparse reward -> Coordinated Exploration

Parameter Sharing vs. Diversity

  • Parameter Sharing is critical for deep MARL methods
  • However, agents tend to acquire homogenous behaviors
  • Diversity is essential for exploration and practical tasks

link to paper: Google Football

Schematics of Our Approach: Celebrating Diversity in Shared MARL (CDS)

  • In representation, CDS allows MARL to adaptively decide when to share learning
  • Encouraging Diversity in Optimization

In optimization, maximizing an information-theoretic objective to achieve identity-aware diversity


\begin{aligned}
I^\pi(\tau_T;id)&=H(\tau_t)-H(\tau_T|id)=\mathbbb{E}_{id,\tau_T\sim \pi}\left[\log \frac{p(\tau_T|id)}{p(\tau_T)}\right]\\
&= \mathbb{E}_{id,\tau}\left[ \log \frac{p(o_0|id)}{p(o_0)}+\sum_{t=0}^{T-1}\log\frac{a_t|\tau_t,id}{p(a_t|\tau_t)}+\log \frac{p(o_{t+1}|\tau_t,a_t,id)}{p(o_{t+1}|\tau_t,a_t)}\right]
\end{aligned}

Here: \sum_{t=0}^{T-1}\log\frac{a_t|\tau_t,id}{p(a_t|\tau_t)} represents the action diversity.

\log \frac{p(o_{t+1}|\tau_t,a_t,id)}{p(o_{t+1}|\tau_t,a_t)} represents the observation diversity.