mas-topology
Description
Multi-Agent Collaboration Topology Design
Objective
Design a novel multi-agent collaboration topology that maximizes the quality of LLM-generated code. You implement a generate_topology(node_num) function that returns directed edges forming a DAG. The agents are organized according to your topology: each agent receives a predecessor's code solution, reviews it, and produces an improved version. When a node has multiple predecessors, solutions are aggregated.
Background
MacNet (Scaling Large-Language-Model-based Multi-Agent Collaboration) organizes LLM agents as nodes in a directed acyclic graph. The topology (graph structure) determines how agents collaborate and significantly impacts code generation quality. Simple chain topologies offer deep iterative refinement but no diversity; star topologies offer breadth but no depth; layered (MLP-like) topologies balance both.
Editable Interface
Modify custom_topology.py which contains:
def generate_topology(node_num: int) -> list[tuple[int, int]]:
"""Return directed edges (source, target) forming a DAG over nodes 0..node_num-1."""
Constraints
- Must return a valid DAG (no cycles)
- All nodes 0 to node_num-1 must be reachable from the input sentinel
- Edges should go from lower-numbered nodes to higher-numbered nodes (or at least respect topological order)
- The system automatically adds input sentinel (-1) connecting to source nodes and output sentinel (-2) connecting from sink nodes
Evaluation
Your topology is evaluated on two benchmarks using 4 agent nodes:
HumanEval (humaneval-4)
A 33-problem subset of HumanEval coding problems with unit tests. For each problem, the multi-agent system collaborates according to your topology to generate a Python function, which is then tested against the problem's unit tests.
- Metric: pass@1 = fraction of problems where the generated code passes all unit tests on the first attempt.
SRDD (srdd-4)
20 curated software development prompts from the SRDD (Software Requirement Document Dataset) categories. For each prompt, agents collaborate to generate a complete software project. The generated project is tested for executability: whether its entry point (main.py) runs without crashing.
- Metric: srdd_exec_rate = fraction of generated projects that execute successfully (exit code 0 or still running at timeout, with no Traceback in stderr).
The default chain topology (0->1->2->...->N-1) is the baseline. Better topologies enable richer agent collaboration, producing more correct code that passes more unit tests and more executable software projects.
Note on reproducibility: The topology is deterministic (same node_num -> same edges). Variability across seeds comes from the LLM API responses, not the topology. Multiple seeds test robustness of a topology under different LLM sampling outcomes.
Network requirement: This task requires internet access at runtime to call LLM APIs (DeepSeek by default). Set DEEPSEEK_API_KEY environment variable before running. Not compatible with offline/air-gapped compute nodes.
Code
1"""Custom multi-agent collaboration topology.23This module defines the DAG topology for multi-agent code generation.4The function generate_topology(node_num) returns a list of directed edges5that determine how LLM agents collaborate to produce code.67The system will automatically add:8- An input sentinel node (-1) connecting to all source nodes (no predecessors)9- An output sentinel node (-2) connecting from all sink nodes (no successors)10"""111213# ── Editable topology function ───────────────────────────────────────14# EDITABLE REGION START15
Additional context files (read-only):
chatdev-macnet/generate_graph.pychatdev-macnet/graph.py
Results
| Model | Type | pass at 1 ↑ |
|---|---|---|
| chain@deepseek-chat | baseline | 0.879 |
| chain@qwen2.5-72b-instruct | baseline | 0.758 |
| layered@deepseek-chat | baseline | 0.758 |
| layered@qwen2.5-72b-instruct | baseline | 0.636 |
| star@deepseek-chat | baseline | 0.849 |
| star@qwen2.5-72b-instruct | baseline | 0.667 |