Agent Conversation: anthropic/claude-opus-4.6
Model: anthropic/claude-opus-4.6
Total steps: 5
Messages: 14
UserStep 0
Task: causal-observational-linear-non-gaussian
Causal Discovery: Observational Linear Non-Gaussian Data
Objective
Implement a causal discovery algorithm that recovers the DAG structure from purely observational data generated by a Linear Non-Gaussian Acyclic Model (LiNGAM). Your code goes in bench/custom_algorithm.py.
Background
LiNGAM-based methods exploit non-Gaussian noise to achieve full DAG identifiability from observational data alone, going beyond the Markov Equivalence Class limit of constraint-based (PC) and score-based (GES) methods.
Evaluation Scenarios
| Label | Graph type | Nodes | Samples | Noise |
|---|---|---|---|---|
| ER10 | Erdos-Renyi | 10 | 250 | Exponential |
| ER15 | Erdos-Renyi | 15 | 500 | Laplace |
| SF12 | Scale-Free (BA) | 12 | 300 | Uniform |
| ER30 | Erdos-Renyi | 30 | 1000 | Laplace |
| ER50 | Erdos-Renyi | 50 | 2000 | Exponential |
| ER50-LowSample | Erdos-Renyi | 50 | 250 | Exponential |
| SF100 | Scale-Free (BA) | 100 | 1000 | Uniform |
| ER20-Dense | Erdos-Renyi | 20 | 500 | Laplace |
Metrics
All computed on the directed edge set (skeleton + direction must be correct):
- F1 (primary ranking metric), SHD, Precision, Recall
Baselines
icalingam: ICA-based LiNGAM (Shimizu 2006)directlingam: DirectLiNGAM (Shimizu 2011)
causal-learn/bench/custom_algorithm.py [EDITABLE — lines 3–14 only]
1: import numpy as np
2:
3: # =====================================================================
4: # EDITABLE: implement run_causal_discovery below
5: # =====================================================================
6: def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7: """
8: Input: X of shape (n_samples, n_variables)
9: Output: adjacency matrix B of shape (n_variables, n_variables)
10: B[i, j] != 0 means j -> i (follows causal-learn convention)
11: """
12: n = X.shape[1]
13: return np.zeros((n, n))
14: # =====================================================================
icalingam baseline — editable region [READ-ONLY — reference implementation]
Lines 3–14:
1: import numpy as np
2:
3: # =====================================================================
4: # EDITABLE: implement run_causal_discovery below
5: # =====================================================================
6: def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7: """
8: Input: X of shape (n_samples, n_variables)
9: Output: adjacency matrix B of shape (n_variables, n_variables)
10: B[i, j] != 0 means j -> i (follows causal-learn convention)
11: """
12: import numpy as np
13: from scipy.optimize import linear_sum_assignment
14: from sklearn.decomposition import FastICA
15: from sklearn.utils import check_array
16: AssistantStep 1edit
Tool Input
1{
2 "op": "replace",
3 "filename": "causal-learn/bench/custom_algorithm.py",
4 "start_line": 3,
5 "end_line": 14,
6 "content": "# =====================================================================\n# Non-Gaussianity-Adaptive Causal Discovery (NACD)\n# Novel: adaptive nonlinear function + BIC-Lasso parent selection\n# =====================================================================\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n \"\"\"\n Input: X of shape (n_samples, n_variables)\n Output: adjacency matrix B of shape (n_variables, n_variables)\n B[i, j] != 0 means j -> i (follows causal-learn convention)\n \"\"\"\n import warnings\n warnings.filterwarnings('ignore')\n from sklearn.utils import check_array\n from sklearn.linear_model import LassoLarsIC\n\n X = check_array(X, copy=True).astype(np.float64)\n n, p = X.shape\n Xs = (X - X.mean(0)) / (X.std(0) + 1e-10)\n\n # === Novel 1: Adaptive nonlinear function based on kurtosis ===\n # Light-tailed noise (Uniform, kurt < 0): x*exp(-x^2/2) captures bounded support\n # Heavy-tailed noise (Laplace/Exponential, kurt > 0): tanh is robust\n kurt = np.median(np.mean(Xs ** 4, axis=0) - 3.0)\n if kurt < -0.5:\n def g(x):\n return x * np.exp(-0.5 * x * x)\n else:\n def g(x):\n return np.tanh(x)\n\n # === DirectLiNGAM Causal Ordering (vectorized) ===\n remaining = list(range(p))\n order = []\n R = Xs.copy()\n\n for _ in range(p):\n nc = len(remaining)\n if nc <= 1:\n if nc == 1:\n order.append(remaining[0])\n break\n\n vR = np.sum(R * R, axis=0) / n # variances\n bc, bs = 0, np.inf\n\n for c in range(nc):\n if vR[c] < 1e-10:\n continue\n xj = R[:, c]\n # Regression betas for all columns on candidate root xj\n betas = (xj @ R) / (n * vR[c])\n # Residuals after removing linear effect of xj\n res = R - np.outer(xj, betas)\n # Standardize for dependence computation\n xjs = (xj - xj.mean()) / (xj.std() + 1e-10)\n rn = (res - res.mean(0)) / (res.std(0) + 1e-10)\n # Non-linear covariance dependence measure\n d = np.abs(g(xjs) @ rn / n) + np.abs(xjs @ g(rn) / n)\n d[c] = 0.0 # exclude self\n s = d.sum()\n if s < bs:\n bs, bc = s, c\n\n order.append(remaining[bc])\n # Residualize: remove effect of chosen root variable\n xm = R[:, bc]\n bm = (xm @ R) / (n * (vR[bc] + 1e-10))\n R = (R - np.outer(xm, bm))[:, [i for i in range(nc) if i != bc]]\n remaining = [remaining[i] for i in range(nc) if i != bc]\n\n # === Novel 2: Adjacency with BIC-regularized Lasso parent selection ===\n # LassoLarsIC selects sparsity via BIC, adapting to each node's\n # effective dimensionality without manual threshold tuning\n B = np.zeros((p, p))\n\n for k in range(1, p):\n j = order[k]\n anc = [order[i] for i in range(k)]\n na = len(anc)\n if na == 0:\n continue\n\n y = Xs[:, j]\n Xa = Xs[:, anc]\n\n try:\n model = LassoLarsIC(criterion='bic', max_iter=10000)\n model.fit(Xa, y)\n coefs = model.coef_\n except Exception:\n # Fallback: OLS + BIC-consistent threshold\n coefs = np.linalg.lstsq(Xa, y, rcond=None)[0]\n res = y - Xa @ coefs\n rv = np.var(res) + 1e-10\n se = np.sqrt(rv / (np.var(Xa, axis=0) * n + 1e-10)) + 1e-10\n coefs = np.where(np.abs(coefs) / se > np.sqrt(np.log(n)), coefs, 0.0)\n\n for idx in range(na):\n if abs(coefs[idx]) > 1e-8:\n B[j, anc[idx]] = coefs[idx]\n\n return B\n# ====================================================================="
7}Tool ResultStep 1
OK: Replaced lines 3..14 with 103 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–105.
[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–105 | total: 105 lines]
1| import numpy as np
2|
3 # =====================================================================
4 # Non-Gaussianity-Adaptive Causal Discovery (NACD)
5 # Novel: adaptive nonlinear function + BIC-Lasso parent selection
6 # =====================================================================
7 def run_causal_discovery(X: np.ndarray) -> np.ndarray:
8 """
9 Input: X of shape (n_samples, n_variables)
10 Output: adjacency matrix B of shape (n_variables, n_variables)
11 B[i, j] != 0 means j -> i (follows causal-learn convention)
12 """
13 import warnings
14 warnings.filterwarnings('ignore')
15 from sklearn.utils import check_array
16 from sklearn.linear_model
AssistantStep 2test
AssistantStep 1edit
Tool Input
1{
2 "op": "replace",
3 "filename": "causal-learn/bench/custom_algorithm.py",
4 "start_line": 3,
5 "end_line": 14,
6 "content": "# =====================================================================\n# Two-Stage LiNGAM: Non-Gaussianity Ordering + BIC-Lasso Parent Selection\n# =====================================================================\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n \"\"\"\n Stage 1: DirectLiNGAM recovers causal ordering via non-Gaussianity.\n Stage 2: BIC-penalized LARS-Lasso selects parents; OLS debiases coefficients.\n B[i, j] != 0 means j -> i.\n \"\"\"\n import warnings; warnings.filterwarnings('ignore')\n from sklearn.utils import check_array\n from sklearn.linear_model import LassoLarsIC\n from causallearn.search.FCMBased.lingam import DirectLiNGAM\n\n X = check_array(X)\n n_samples, n_features = X.shape\n\n # Stage 1: Causal ordering via DirectLiNGAM\n dlingam = DirectLiNGAM()\n dlingam.fit(X)\n order = dlingam.causal_order_\n\n # Stage 2: BIC-Lasso parent selection + OLS debiasing\n B = np.zeros((n_features, n_features))\n for i in range(1, len(order)):\n target = order[i]\n parents = list(order[:i])\n if not parents:\n continue\n y = X[:, target]\n Xp = X[:, parents]\n try:\n lasso = LassoLarsIC(criterion='bic')\n lasso.fit(Xp, y)\n sel = np.abs(lasso.coef_) > 1e-10\n if np.any(sel):\n sel_idx = [parents[j] for j in range(len(parents)) if sel[j]]\n Xs = X[:, sel_idx]\n beta = np.linalg.lstsq(Xs, y, rcond=None)[0]\n for j, s in enumerate(sel_idx):\n B[target, s] = beta[j]\n except Exception:\n for j, s in enumerate(parents):\n v = dlingam.adjacency_matrix_[target, s]\n if abs(v) > 0:\n B[target, s] = v\n return B\n# =====================================================================\n"
7}Tool ResultStep 1
OK: Replaced lines 3..14 with 48 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–50.
[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–50 | total: 50 lines]
1| import numpy as np
2|
3 # =====================================================================
4 # Two-Stage LiNGAM: Non-Gaussianity Ordering + BIC-Lasso Parent Selection
5 # =====================================================================
6 def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7 """
8 Stage 1: DirectLiNGAM recovers causal ordering via non-Gaussianity.
9 Stage 2: BIC-penalized LARS-Lasso selects parents; OLS debiases coefficients.
10 B[i, j] != 0 means j -> i.
11 """
12 import warnings; warnings.filterwarnings('ignore')
13 from sklearn.utils import check_array
14 from sklearn.linear_model import LassoLarsIC
15 from causallearn.search.FCMBased.lingam
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)
### ER10 (scripts/eval_er10.sh)
Results (ER10):
SHD=6 F1=0.8235 Precision=0.7000 Recall=1.0000
### ER15 (scripts/eval_er15.sh)
Results (ER15):
SHD=6 F1=0.8571 Precision=0.7500 Recall=1.0000
### SF12 (scripts/eval_sf12.sh)
Results (SF12):
SHD=19 F1=0.6780 Precision=0.5128 Recall=1.0000
### ER30 (scripts/eval_er30.sh)
Results (ER30):
SHD=61 F1=0.7766 Precision=0.6347 Recall=1.0000
### ER50 (scripts/eval_er50.sh)
Results (ER50):
SHD=167 F1=0.7504 Precision=0.6005 Recall=1.0000
### ER50-LowSample (scripts/eval_er50_lowsample.sh)
Results (ER50-LowSample):
SHD=173 F1=0.5783 Precision=0.4138 Recall=0.9600
### ER20-Dense (scripts/eval_er20_dense.sh)
Results (ER20-Dense):
SHD=33 F1=0.8546 Precision=0.7462 Recall=1.0000
[Leaderboard] Results saved: {'shd_ER10': 6, 'f1_ER10': 0.8235, 'precision_ER10': 0.7, 'recall_ER10': 1.0, 'elapsed_ER10': 3, 'shd_ER15': 6, 'f1_ER15': 0.8571, 'precis
AssistantStep 3edit
Tool Input
1{
2 "op": "replace",
3 "filename": "causal-learn/bench/custom_algorithm.py",
4 "start_line": 3,
5 "end_line": 50,
6 "content": "# =====================================================================\n# Robust DirectLiNGAM with Backward Stepwise Edge Refinement\n# Novel: combines non-Gaussianity-based DAG discovery (DirectLiNGAM)\n# with partial F-test backward elimination. Edges must pass both\n# LiNGAM's independence criterion AND statistical significance for\n# variance explanation, providing complementary validation layers.\n# =====================================================================\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n \"\"\"\n Stage 1: DirectLiNGAM for non-Gaussianity-based ordering and initial DAG.\n Stage 2: Backward stepwise edge refinement using partial F-tests — each\n parent must significantly reduce residual variance to be retained.\n B[i, j] != 0 means j -> i.\n \"\"\"\n import warnings; warnings.filterwarnings('ignore')\n from sklearn.utils import check_array\n from scipy import stats\n from causallearn.search.FCMBased.lingam import DirectLiNGAM\n\n X = check_array(X)\n n_samples, n_features = X.shape\n\n # Stage 1: DirectLiNGAM (ordering + non-Gaussianity-based edge estimation)\n model = DirectLiNGAM()\n model.fit(X)\n B = model.adjacency_matrix_.copy()\n\n # Stage 2: Backward stepwise refinement via partial F-tests\n # For each node with multiple parents, iteratively remove the least\n # significant parent until all remaining parents are significant.\n alpha = 0.01 # Conservative significance level\n\n for i in range(n_features):\n parents = list(np.where(B[i, :] != 0)[0])\n if len(parents) <= 1:\n continue\n\n y = X[:, i]\n current = list(parents)\n\n changed = True\n while changed and len(current) > 1:\n changed = False\n Xc = X[:, current]\n beta = np.linalg.lstsq(Xc, y, rcond=None)[0]\n rss_full = np.sum((y - Xc @ beta) ** 2)\n df_full = max(n_samples - len(current), 1)\n mse_full = rss_full / df_full\n\n # Find the least significant parent via partial F-test\n worst_pval = -1.0\n worst_idx = -1\n for j in range(len(current)):\n mask = list(range(len(current)))\n mask.remove(j)\n Xr = Xc[:, mask]\n beta_r = np.linalg.lstsq(Xr, y, rcond=None)[0]\n rss_red = np.sum((y - Xr @ beta_r) ** 2)\n\n f_stat = max((rss_red - rss_full), 0.0) / (mse_full + 1e-15)\n pval = 1.0 - stats.f.cdf(f_stat, 1, df_full)\n\n if pval > worst_pval:\n worst_pval = pval\n worst_idx = j\n\n if worst_pval > alpha and worst_idx >= 0:\n del current[worst_idx]\n changed = True\n\n # Zero out edges that were removed\n removed = set(parents) - set(current)\n for r in removed:\n B[i, r] = 0.0\n\n return B\n# =====================================================================\n"
7}Tool ResultStep 3
OK: Replaced lines 3..50 with 77 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–79.
[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–79 | total: 79 lines]
1| import numpy as np
2|
3 # =====================================================================
4 # Robust DirectLiNGAM with Backward Stepwise Edge Refinement
5 # Novel: combines non-Gaussianity-based DAG discovery (DirectLiNGAM)
6 # with partial F-test backward elimination. Edges must pass both
7 # LiNGAM's independence criterion AND statistical significance for
8 # variance explanation, providing complementary validation layers.
9 # =====================================================================
10 def run_causal_discovery(X: np.ndarray) -> np.ndarray:
11 """
12 Stage 1: DirectLiNGAM for non-Gaussianity-based ordering and initial DAG.
13 Stage 2: Backward stepwise edge refinement using partial F-te
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=2) to submit this result)
### ER10 (scripts/eval_er10.sh)
Results (ER10):
SHD=0 F1=1.0000 Precision=1.0000 Recall=1.0000
### ER15 (scripts/eval_er15.sh)
Results (ER15):
SHD=2 F1=0.9474 Precision=0.9000 Recall=1.0000
### SF12 (scripts/eval_sf12.sh)
Results (SF12):
SHD=1 F1=0.9756 Precision=0.9524 Recall=1.0000
### ER30 (scripts/eval_er30.sh)
Results (ER30):
SHD=1 F1=0.9953 Precision=0.9907 Recall=1.0000
### ER50 (scripts/eval_er50.sh)
Results (ER50):
SHD=1 F1=0.9980 Precision=0.9960 Recall=1.0000
### ER50-LowSample (scripts/eval_er50_lowsample.sh)
Results (ER50-LowSample):
SHD=23 F1=0.9064 Precision=0.8521 Recall=0.9680
### ER20-Dense (scripts/eval_er20_dense.sh)
Results (ER20-Dense):
SHD=1 F1=0.9949 Precision=0.9898 Recall=1.0000
[Leaderboard] Results saved: {'shd_ER10': 0, 'f1_ER10': 1.0, 'precision_ER10': 1.0, 'recall_ER10': 1.0, 'elapsed_ER10': 2, 'shd_ER15': 2, 'f1_ER15': 0.9474, 'precision_ER15':
AssistantStep 5submit
Tool Input
1{
2 "n": 2
3}Tool ResultStep 5
[submit] Submitting result from test #2 as final.
### ER10 (scripts/eval_er10.sh)
Results (ER10):
SHD=0 F1=1.0000 Precision=1.0000 Recall=1.0000
### ER15 (scripts/eval_er15.sh)
Results (ER15):
SHD=2 F1=0.9474 Precision=0.9000 Recall=1.0000
### SF12 (scripts/eval_sf12.sh)
Results (SF12):
SHD=1 F1=0.9756 Precision=0.9524 Recall=1.0000
### ER30 (scripts/eval_er30.sh)
Results (ER30):
SHD=1 F1=0.9953 Precision=0.9907 Recall=1.0000
### ER50 (scripts/eval_er50.sh)
Results (ER50):
SHD=1 F1=0.9980 Precision=0.9960 Recall=1.0000
### ER50-LowSample (scripts/eval_er50_lowsample.sh)
Results (ER50-LowSample):
SHD=23 F1=0.9064 Precision=0.8521 Recall=0.9680
### SF100 (scripts/eval_sf100.sh)
Results (SF100):
SHD=0 F1=1.0000 Precision=1.0000 Recall=1.0000
### ER20-Dense (scripts/eval_er20_dense.sh)
Results (ER20-Dense):
SHD=1 F1=0.9949 Precision=0.9898 Recall=1.0000
[Leaderboard] Results saved: {'shd_ER10': 0, 'f1_ER10': 1.0, 'precision_ER10': 1.0, 'reca