marl-mixing-network
Description
Cooperative MARL: Value Decomposition Mixing Network
Objective
Improve cooperative multi-agent reinforcement learning by designing a better value decomposition mixing network. You can modify the CustomMixer class (lines 13-49) and add custom imports (lines 7-8) in custom.py.
Background
In cooperative MARL, agents share a common reward but each agent has only a partial observation. Value decomposition methods learn individual agent Q-values and combine them into a joint Q_tot using a mixing network. The quality of this mixing network directly determines how well individual agents can coordinate.
The training uses EPyMARL with Q-learning on three PettingZoo MPE cooperative tasks:
- simple_spread: 3 agents must spread out to cover 3 landmarks while avoiding collisions.
- simple_tag: 3 predator agents cooperate to catch a pretrained prey controlled by a fixed DDPG policy.
- simple_speaker_listener: a speaker and listener must coordinate through communication to reach the correct target.
The default mixer is a simple learnable weighted sum that does not condition on the global state. Each setup trains for 2M environment timesteps with epsilon-greedy exploration.
Interface
Your CustomMixer must:
- Inherit from
nn.Module - Accept
argsin__init__with attributes:n_agents,state_shape,mixing_embed_dim - Implement
forward(self, agent_qs, states)where:agent_qs: shape(batch, T, n_agents)— individual agent Q-valuesstates: shape(batch, T, state_dim)— global state information- Returns
q_tot: shape(batch, T, 1)— joint action value
Reference Implementations
- VDN (
vdn.py): Simple sum,Q_tot = sum(Q_i). No parameters, no state conditioning. - LinearMixer: Learnable weighted sum with a bias term. State-agnostic but more flexible than VDN.
- QMIX (
qmix.py): Uses hypernetworks conditioned on global state to generate mixing weights. Enforces monotonicity via absolute value on weights.
Evaluation
Final performance is measured by mean episode return over 32 test episodes with greedy policy, evaluated separately on all three setups and recorded to the leaderboard under setup-specific metric keys.
Code
1import numpy as np2import torch as th3import torch.nn as nn4import torch.nn.functional as F567# ── Custom imports (editable) ────────────────────────────────────────────8910# ======================================================================11# EDITABLE — Custom mixing network12# ======================================================================13class CustomMixer(nn.Module):14"""Custom mixing network for cooperative MARL value decomposition.15
Additional context files (read-only):
epymarl/src/modules/mixers/vdn.pyepymarl/src/modules/mixers/qmix.pyepymarl/src/learners/q_learner.py
Results
No results yet.