# CSE510 Deep Reinforcement Learning (Lecture 26) ## Continue on Real-World Practical Challenges for RL ### Factored multi-agent RL - Sample efficiency -> Shared Learning - Complexity -> High-Order Factorization - Partial Observability -> Communication Learning - Sparse reward -> Coordinated Exploration #### Parameter Sharing vs. Diversity - Parameter Sharing is critical for deep MARL methods - However, agents tend to acquire homogenous behaviors - Diversity is essential for exploration and practical tasks [link to paper: Google Football](https://arxiv.org/pdf/1907.11180) Schematics of Our Approach: Celebrating Diversity in Shared MARL (CDS) - In representation, CDS allows MARL to adaptively decide when to share learning - Encouraging Diversity in Optimization In optimization, maximizing an information-theoretic objective to achieve identity-aware diversity $$ \begin{aligned} I^\pi(\tau_T;id)&=H(\tau_t)-H(\tau_T|id)=\mathbbb{E}_{id,\tau_T\sim \pi}\left[\log \frac{p(\tau_T|id)}{p(\tau_T)}\right]\\ &= \mathbb{E}_{id,\tau}\left[ \log \frac{p(o_0|id)}{p(o_0)}+\sum_{t=0}^{T-1}\log\frac{a_t|\tau_t,id}{p(a_t|\tau_t)}+\log \frac{p(o_{t+1}|\tau_t,a_t,id)}{p(o_{t+1}|\tau_t,a_t)}\right] \end{aligned} $$ Here: $\sum_{t=0}^{T-1}\log\frac{a_t|\tau_t,id}{p(a_t|\tau_t)}$ represents the action diversity. $\log \frac{p(o_{t+1}|\tau_t,a_t,id)}{p(o_{t+1}|\tau_t,a_t)}$ represents the observation diversity.