Discrete Causal Graph Discovery

Studies how causal discovery algorithms recover equivalence-class graph structure from discrete observational data.

Structured & Causal Reasoningcausal-bnlearn
causal-discovery-discrete

Description

Causal Discovery on Discrete Bayesian Network Datasets (bnlearn)

Research Question

Design a causal discovery algorithm that recovers the CPDAG (Completed Partially Directed Acyclic Graph) from purely observational, integer-coded discrete data sampled from real-world Bayesian networks in the bnlearn repository.

Background

The bnlearn repository (https://www.bnlearn.com/bnrepository/) hosts a collection of well-known Bayesian network benchmarks from diverse domains (medicine, biology, meteorology, insurance, agriculture, IT). Each network has a known ground-truth DAG with discrete variables and conditional probability tables.

Under the faithfulness assumption, observational data can identify only the Markov Equivalence Class (MEC) of the true DAG, represented by a CPDAG. The challenge lies in handling discrete data with varying cardinalities, network sizes (small to >70 nodes), and edge densities, without over-specializing to a single scale or cardinality pattern.

Task

Implement a causal discovery algorithm in bench/custom_algorithm.py. The run_causal_discovery(X) function receives integer-encoded discrete observational data and must return the estimated CPDAG as a causallearn.graph.GeneralGraph.GeneralGraph object.

def run_causal_discovery(X: np.ndarray) -> GeneralGraph:
    """
    Input:  X of shape (n_samples, n_variables), integer-encoded discrete data
    Output: estimated CPDAG as causallearn.graph.GeneralGraph.GeneralGraph
    """

Evaluation Networks

LabelNodesEdgesDomain
Cancer54Medical
Child2025Medical
Alarm3746Medical monitoring
Hailfinder5666Meteorology
Win95pts76112IT (Windows troubleshooting)

Each network is sampled with a fixed observational sample size; the agent must generalize across small/medium/large networks and across different cardinality patterns.

Metrics

Metrics are computed between the estimated CPDAG and the ground-truth CPDAG (converted from the true DAG via dag2cpdag):

  • SHD (Structural Hamming Distance): total edge errors (lower is better)
  • Adjacency Precision / Recall: skeleton recovery quality (higher is better)
  • Arrow Precision / Recall: edge orientation accuracy (higher is better)

Reference baselines

The benchmark ships several classical baselines for comparison. Citations are provided so the agent can study the prior art; default hyperparameters are the ones recommended in the cited papers (e.g., chi-squared CI test for PC, BDeu score for the score-based methods).

  • pc: Peter-Clark algorithm with chi-squared CI test. Constraint-based. Spirtes, Glymour & Scheines, Causation, Prediction, and Search (MIT Press, 2nd ed., 2000).
  • ges: Greedy Equivalence Search with BDeu score. Score-based. Chickering, "Optimal Structure Identification With Greedy Search," JMLR 3, 2002.
  • grasp: Greedy Relaxations of the Sparsest Permutation with BDeu score. Permutation-based. Lam, Andrews & Ramsey, "Greedy Relaxations of the Sparsest Permutation Algorithm," UAI 2022 (arXiv:2206.05421).
  • boss: Best Order Score Search with BDeu score. Permutation-based. Andrews et al., "Fast Scalable and Accurate Discovery of DAGs Using the Best Order Score Search and Grow-Shrink Trees," NeurIPS 2023 (arXiv:2310.17679).
  • hc: Hill-Climbing search with BDeu score. Score-based, classical local search baseline.

The contribution should be a modular causal discovery procedure for discrete observational data, such as a constraint-based, score-based, permutation-based, hybrid, or otherwise principled alternative, while staying within the provided causal graph interface.

Code

custom_algorithm.py
EditableRead-only
1import numpy as np
2from causallearn.graph.GeneralGraph import GeneralGraph
3from causallearn.graph.GraphNode import GraphNode
4
5# =====================================================================
6# EDITABLE: implement run_causal_discovery below
7# =====================================================================
8def run_causal_discovery(X: np.ndarray) -> GeneralGraph:
9 """
10 Input: X of shape (n_samples, n_variables), integer-encoded discrete data
11 Output: estimated CPDAG as causallearn.graph.GeneralGraph.GeneralGraph
12 """
13 nodes = [GraphNode(f"X{i + 1}") for i in range(X.shape[1])]
14 return GeneralGraph(nodes)
15# =====================================================================
run_eval.py
EditableRead-only
1"""Evaluation harness for the causal-discovery-discrete task."""
2import argparse
3import os
4import sys
5
6sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
7
8from custom_algorithm import run_causal_discovery
9from data_gen import load_and_sample
10from metrics import compute_metrics
11
12
13def main():
14 parser = argparse.ArgumentParser(
15 description="Evaluate CPDAG recovery on bnlearn discrete data."
data_gen.py
EditableRead-only
1"""Data loader for bnlearn Bayesian network benchmarks.
2
3Uses pgmpy's bundled BIF files (no network access needed) to load
4real-world Bayesian networks, sample discrete observational data,
5and extract the ground-truth DAG.
6"""
7import numpy as np
8import pandas as pd
9
10from causallearn.graph.Dag import Dag
11from causallearn.graph.GraphNode import GraphNode
12
13# All discrete bnlearn networks supported by pgmpy
14SUPPORTED_NETWORKS = [
15 "cancer", "earthquake", "survey", "asia", "sachs",
metrics.py
EditableRead-only
1"""Evaluation metrics for CPDAG recovery on bnlearn discrete data."""
2from causallearn.graph.AdjacencyConfusion import AdjacencyConfusion
3from causallearn.graph.ArrowConfusion import ArrowConfusion
4from causallearn.graph.Graph import Graph
5from causallearn.graph.SHD import SHD
6from causallearn.utils.DAG2CPDAG import dag2cpdag
7
8
9def _safe_div(numerator, denominator):
10 return numerator / denominator if denominator > 0 else 0.0
11
12
13def _normalize_graph_output(graph_output):
14 """Normalize algorithm output to a causallearn Graph object."""
15 if isinstance(graph_output, dict) and "G" in graph_output:

Method Summary

Auto-summarized from each method's code by an LLM reviewer — not the model's original output. Browse via the picker below; the Code section is independent.
Baselines
Agents
Claude Opus 4.6·Pseudocodehigh

GES + size-gated G-test pruning

Run GES with BDeu, then on graphs with more than 20 variables drop edges that pass a chi-squared CI test conditioned on safe-cardinality neighbour subsets.

1. GGES(X,BDeu)G \leftarrow \mathrm{GES}(X,\,\text{BDeu})
2. if p20p \le 20 return GG
3. for each edge (i,j)G(i,j) \in G:
4. Nadj(i)adj(j){i,j}\mathcal{N} \leftarrow \mathrm{adj}(i) \cup \mathrm{adj}(j) \setminus \{i,j\}
5. build candidate sep-sets S={,{k}kN,N}\mathcal{S} = \{\emptyset, \{k\}_{k\in\mathcal{N}}, \mathcal{N}\}
6. keep only SS with rirjkSrkn/5r_i\,r_j\,\prod_{k\in S} r_k \le n/5
7. if S\exists S with G-test p-value >0.01> 0.01, remove edge (i,j)(i,j)
8. return GG
Δ vs. baselineStarts from the GES baseline but adds a post-hoc G-test (gsq) edge-pruning pass restricted to neighbourhood-only conditioning sets and only triggered for large graphs.
size_gate=20alpha=0.01card_budget=n/5Recovers Plain GES on small graphs

Results