mas-topology

Deep Learningchatdev-macnetrigorous codebase

Description

Multi-Agent Collaboration Topology Design

Objective

Design a novel multi-agent collaboration topology that maximizes the quality of LLM-generated code. You implement a generate_topology(node_num) function that returns directed edges forming a DAG. The agents are organized according to your topology: each agent receives a predecessor's code solution, reviews it, and produces an improved version. When a node has multiple predecessors, solutions are aggregated.

Background

MacNet (Scaling Large-Language-Model-based Multi-Agent Collaboration) organizes LLM agents as nodes in a directed acyclic graph. The topology (graph structure) determines how agents collaborate and significantly impacts code generation quality. Simple chain topologies offer deep iterative refinement but no diversity; star topologies offer breadth but no depth; layered (MLP-like) topologies balance both.

Editable Interface

Modify custom_topology.py which contains:

def generate_topology(node_num: int) -> list[tuple[int, int]]:
    """Return directed edges (source, target) forming a DAG over nodes 0..node_num-1."""

Constraints

Must return a valid DAG (no cycles)
All nodes 0 to node_num-1 must be reachable from the input sentinel
Edges should go from lower-numbered nodes to higher-numbered nodes (or at least respect topological order)
The system automatically adds input sentinel (-1) connecting to source nodes and output sentinel (-2) connecting from sink nodes

Evaluation

Your topology is evaluated on two benchmarks using 4 agent nodes:

HumanEval (humaneval-4)

A 33-problem subset of HumanEval coding problems with unit tests. For each problem, the multi-agent system collaborates according to your topology to generate a Python function, which is then tested against the problem's unit tests.

Metric: pass@1 = fraction of problems where the generated code passes all unit tests on the first attempt.

SRDD (srdd-4)

20 curated software development prompts from the SRDD (Software Requirement Document Dataset) categories. For each prompt, agents collaborate to generate a complete software project. The generated project is tested for executability: whether its entry point (main.py) runs without crashing.

Metric: srdd_exec_rate = fraction of generated projects that execute successfully (exit code 0 or still running at timeout, with no Traceback in stderr).

The default chain topology (0->1->2->...->N-1) is the baseline. Better topologies enable richer agent collaboration, producing more correct code that passes more unit tests and more executable software projects.

Note on reproducibility: The topology is deterministic (same node_num -> same edges). Variability across seeds comes from the LLM API responses, not the topology. Multiple seeds test robustness of a topology under different LLM sampling outcomes.

Network requirement: This task requires internet access at runtime to call LLM APIs (DeepSeek by default). Set DEEPSEEK_API_KEY environment variable before running. Not compatible with offline/air-gapped compute nodes.

Code

custom_topology.py

EditableRead-only

1"""Custom multi-agent collaboration topology.
2
3This module defines the DAG topology for multi-agent code generation.
4The function generate_topology(node_num) returns a list of directed edges
5that determine how LLM agents collaborate to produce code.
6
7The system will automatically add:
8  - An input sentinel node (-1) connecting to all source nodes (no predecessors)
9  - An output sentinel node (-2) connecting from all sink nodes (no successors)
10"""
11
12
13# ── Editable topology function ───────────────────────────────────────
14# EDITABLE REGION START
15

Additional context files (read-only):

chatdev-macnet/generate_graph.py
chatdev-macnet/graph.py

Results

Model	Type	pass at 1 ↑
chain@deepseek-chat	baseline	0.879
chain@qwen2.5-72b-instruct	baseline	0.758
layered@deepseek-chat	baseline	0.758
layered@qwen2.5-72b-instruct	baseline	0.636
star@deepseek-chat	baseline	0.849
star@qwen2.5-72b-instruct	baseline	0.667