Discrete Causal Graph Discovery
Studies how causal discovery algorithms recover equivalence-class graph structure from discrete observational data.

Description
Causal Discovery on Discrete Bayesian Network Datasets (bnlearn)
Research Question
Design a causal discovery algorithm that recovers the CPDAG (Completed Partially Directed Acyclic Graph) from purely observational, integer-coded discrete data sampled from real-world Bayesian networks in the bnlearn repository.
Background
The bnlearn repository (https://www.bnlearn.com/bnrepository/) hosts a collection of well-known Bayesian network benchmarks from diverse domains (medicine, biology, meteorology, insurance, agriculture, IT). Each network has a known ground-truth DAG with discrete variables and conditional probability tables.
Under the faithfulness assumption, observational data can identify only the Markov Equivalence Class (MEC) of the true DAG, represented by a CPDAG. The challenge lies in handling discrete data with varying cardinalities, network sizes (small to >70 nodes), and edge densities, without over-specializing to a single scale or cardinality pattern.
Task
Implement a causal discovery algorithm in bench/custom_algorithm.py. The
run_causal_discovery(X) function receives integer-encoded discrete
observational data and must return the estimated CPDAG as a
causallearn.graph.GeneralGraph.GeneralGraph object.
def run_causal_discovery(X: np.ndarray) -> GeneralGraph:
"""
Input: X of shape (n_samples, n_variables), integer-encoded discrete data
Output: estimated CPDAG as causallearn.graph.GeneralGraph.GeneralGraph
"""
Evaluation Networks
| Label | Nodes | Edges | Domain |
|---|---|---|---|
| Cancer | 5 | 4 | Medical |
| Child | 20 | 25 | Medical |
| Alarm | 37 | 46 | Medical monitoring |
| Hailfinder | 56 | 66 | Meteorology |
| Win95pts | 76 | 112 | IT (Windows troubleshooting) |
Each network is sampled with a fixed observational sample size; the agent must generalize across small/medium/large networks and across different cardinality patterns.
Metrics
Metrics are computed between the estimated CPDAG and the ground-truth CPDAG
(converted from the true DAG via dag2cpdag):
- SHD (Structural Hamming Distance): total edge errors (lower is better)
- Adjacency Precision / Recall: skeleton recovery quality (higher is better)
- Arrow Precision / Recall: edge orientation accuracy (higher is better)
Reference baselines
The benchmark ships several classical baselines for comparison. Citations are provided so the agent can study the prior art; default hyperparameters are the ones recommended in the cited papers (e.g., chi-squared CI test for PC, BDeu score for the score-based methods).
pc: Peter-Clark algorithm with chi-squared CI test. Constraint-based. Spirtes, Glymour & Scheines, Causation, Prediction, and Search (MIT Press, 2nd ed., 2000).ges: Greedy Equivalence Search with BDeu score. Score-based. Chickering, "Optimal Structure Identification With Greedy Search," JMLR 3, 2002.grasp: Greedy Relaxations of the Sparsest Permutation with BDeu score. Permutation-based. Lam, Andrews & Ramsey, "Greedy Relaxations of the Sparsest Permutation Algorithm," UAI 2022 (arXiv:2206.05421).boss: Best Order Score Search with BDeu score. Permutation-based. Andrews et al., "Fast Scalable and Accurate Discovery of DAGs Using the Best Order Score Search and Grow-Shrink Trees," NeurIPS 2023 (arXiv:2310.17679).hc: Hill-Climbing search with BDeu score. Score-based, classical local search baseline.
The contribution should be a modular causal discovery procedure for discrete observational data, such as a constraint-based, score-based, permutation-based, hybrid, or otherwise principled alternative, while staying within the provided causal graph interface.
Code
1import numpy as np2from causallearn.graph.GeneralGraph import GeneralGraph3from causallearn.graph.GraphNode import GraphNode45# =====================================================================6# EDITABLE: implement run_causal_discovery below7# =====================================================================8def run_causal_discovery(X: np.ndarray) -> GeneralGraph:9"""10Input: X of shape (n_samples, n_variables), integer-encoded discrete data11Output: estimated CPDAG as causallearn.graph.GeneralGraph.GeneralGraph12"""13nodes = [GraphNode(f"X{i + 1}") for i in range(X.shape[1])]14return GeneralGraph(nodes)15# =====================================================================
1"""Evaluation harness for the causal-discovery-discrete task."""2import argparse3import os4import sys56sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))78from custom_algorithm import run_causal_discovery9from data_gen import load_and_sample10from metrics import compute_metrics111213def main():14parser = argparse.ArgumentParser(15description="Evaluate CPDAG recovery on bnlearn discrete data."
1"""Data loader for bnlearn Bayesian network benchmarks.23Uses pgmpy's bundled BIF files (no network access needed) to load4real-world Bayesian networks, sample discrete observational data,5and extract the ground-truth DAG.6"""7import numpy as np8import pandas as pd910from causallearn.graph.Dag import Dag11from causallearn.graph.GraphNode import GraphNode1213# All discrete bnlearn networks supported by pgmpy14SUPPORTED_NETWORKS = [15"cancer", "earthquake", "survey", "asia", "sachs",
1"""Evaluation metrics for CPDAG recovery on bnlearn discrete data."""2from causallearn.graph.AdjacencyConfusion import AdjacencyConfusion3from causallearn.graph.ArrowConfusion import ArrowConfusion4from causallearn.graph.Graph import Graph5from causallearn.graph.SHD import SHD6from causallearn.utils.DAG2CPDAG import dag2cpdag789def _safe_div(numerator, denominator):10return numerator / denominator if denominator > 0 else 0.0111213def _normalize_graph_output(graph_output):14"""Normalize algorithm output to a causallearn Graph object."""15if isinstance(graph_output, dict) and "G" in graph_output:
Method Summary
GES + size-gated G-test pruning
Run GES with BDeu, then on graphs with more than 20 variables drop edges that pass a chi-squared CI test conditioned on safe-cardinality neighbour subsets.
1.2. if return3. for each edge :4.5. build candidate sep-sets6. keep only with7. if with G-test p-value , remove edge8. return