updates?

2025-11-18 14:08:20 -06:00
parent 9416bd4956
commit 2946feefbe
4 changed files with 18 additions and 72 deletions
--- a/content/CSE510/CSE510_L24.md
+++ b/content/CSE510/CSE510_L24.md
@@ -4,8 +4,6 @@

 This lecture introduces cooperative multi-agent reinforcement learning, focusing on formal models, value factorization, and modern algorithms such as QMIX and QPLEX.

-
-
 ## Multi-Agent Coordination Under Uncertainty

 In cooperative MARL, multiple agents aim to maximize a shared team reward. The environment can be modeled using a Markov game or a Decentralized Partially Observable MDP (Dec-POMDP).
@@ -39,7 +37,6 @@ Parameter explanations:

 Training uses global information (centralized), but execution uses local agent observations. This is critical for real-world deployment.

-
 ## Joint vs Factored Q-Learning

 ### Joint Q-Learning
@@ -75,15 +72,12 @@ Parameter explanations:

 The goal is to enable decentralized greedy action selection.

-
-
 ## Individual-Global-Max (IGM) Condition

 The IGM condition enables decentralized optimal action selection:

 $$
-\arg\max_{\mathbf{a}} Q_{tot}(s,\mathbf{a})
-===========================================
+\arg\max_{\mathbf{a}} Q_{tot}(s,\mathbf{a})=

 \big(\arg\max_{a_{1}} Q_{1}(s,a_{1}), \dots, \arg\max_{a_{n}} Q_{n}(s,a_{n})\big)
 $$
@@ -96,8 +90,6 @@ Parameter explanations:

 IGM makes decentralized execution optimal with respect to the learned factorized value.

-
-
 ## Linear Value Factorization

 ### VDN (Value Decomposition Networks)
@@ -123,8 +115,6 @@ Cons:
 - Limited representation capacity.
 - Cannot model non-linear teamwork interactions.

-
-
 ## QMIX: Monotonic Value Factorization

 QMIX uses a state-conditioned mixing network enforcing monotonicity:
@@ -154,8 +144,6 @@ Benefits:
 - More expressive than VDN.
 - Supports CTDE while keeping decentralized greedy execution.

-
-
 ## Theoretical Issues With Linear and Monotonic Factorization

 Limitations:
@@ -164,8 +152,6 @@ Limitations:
 - QMIX monotonicity limits representation power for tasks requiring non-monotonic interactions.
 - Off-policy training can diverge in some factorizations.

-
-
 ## QPLEX: Duplex Dueling Multi-Agent Q-Learning

 QPLEX introduces a dueling architecture that satisfies IGM while providing full representation capacity within the IGM class.
@@ -193,8 +179,6 @@ QPLEX Properties:
 - Has full representation capacity for all IGM-consistent Q-functions.
 - Enables stable off-policy training.

-
-
 ## QPLEX Training Objective

 QPLEX minimizes a TD loss over $Q_{tot}$:
@@ -211,8 +195,6 @@ Parameter explanations:
 - $\mathbf{a'}$: next joint action evaluated by TD target.
 - $Q_{tot}$: QPLEX global value estimate.

-
-
 ## Role of Credit Assignment

 Credit assignment addresses: "Which agent contributed what to the team reward?"
@@ -223,8 +205,6 @@ Value factorization supports implicit credit assignment:
 - Dueling architectures allow each agent to learn its influence.
 - QPLEX provides clean marginal contributions implicitly.

-
-
 ## Performance on SMAC Benchmarks

 QPLEX outperforms:
@@ -240,8 +220,6 @@ Key reasons:
 - Strong representational capacity.
 - Off-policy stability.

-
-
 ## Extensions: Diversity and Shared Parameter Learning

 Parameter sharing encourages sample efficiency, but can cause homogeneous agent behavior.
@@ -254,8 +232,6 @@ Approaches such as CDS (Celebrating Diversity in Shared MARL) introduce:

 These techniques improve exploration and cooperation in complex multi-agent tasks.

-
-
 ## Summary of Lecture 24

 Key points:
@@ -266,10 +242,3 @@ Key points:
 - QPLEX achieves full IGM representational capacity.
 - Implicit credit assignment arises naturally from factorization.
 - Diversity methods allow richer multi-agent coordination strategies.
-
-
-
-## Recommended Screenshot Frames for Lecture 24
-
- Lecture 24, page 16: CTDE and QMIX architecture diagram (mixing network). Subsection: "QMIX: Monotonic Value Factorization".
- Lecture 24, page 31: QPLEX benchmark performance on SMAC. Subsection: "Performance on SMAC Benchmarks".