Agent Conversation: anthropic/claude-opus-4.6

Model: anthropic/claude-opus-4.6

Total steps: 6

Messages: 13

UserStep 0

Task: causal-observational-nonlinear

Causal Discovery: Observational Nonlinear Data

Objective

Implement a causal discovery algorithm that recovers the DAG structure from purely observational data generated by a nonlinear Additive Noise Model (ANM). Your code goes in bench/custom_algorithm.py.

Background

Additive Noise Models (ANMs) assume structural equations of the form x_j = f_j(parents(j)) + e_j, where f_j are nonlinear functions and e_j are independent noise terms. Under mild conditions on f_j and the noise distribution, the causal DAG is identifiable from observational data alone (Hoyer et al., 2008; Peters et al., 2014). This goes beyond linear methods (LiNGAM, PC, GES) that assume linearity.

Key challenges include: (1) testing independence in high-dimensional nonlinear settings, (2) scaling to larger graphs, (3) handling diverse nonlinear function types (MLP, GP, polynomial, sigmoid), and (4) robustness to different noise distributions including the harder Gaussian noise case.

Evaluation Scenarios

Label	Graph type	Nodes	Samples	Noise	Nonlinearity
ER8-MLP	Erdos-Renyi	8	500	Exponential	MLP
ER12-GP	Erdos-Renyi	12	1000	Laplace	GP
SF10-Mixed	Scale-Free (BA)	10	500	Uniform	Mixed
ER15-Sigmoid	Erdos-Renyi	15	1000	Exponential	Sigmoid
ER20-Mixed	Erdos-Renyi	20	2000	Laplace	Mixed
SF20-GP	Scale-Free (BA)	20	2000	Exponential	GP
ER12-LowSample	Erdos-Renyi	12	150	Laplace	Mixed
ER20-Gauss	Erdos-Renyi	20	2000	Gaussian	Mixed

Metrics

All computed on the directed edge set (skeleton + direction must be correct):

F1 (primary ranking metric), SHD, Precision, Recall

Baselines

cam: Causal Additive Models (Buehlmann et al., 2014) -- score-based with GAM regression
notears_mlp: NOTEARS with MLP parametrization (Zheng et al., 2020) -- continuous optimization
directlingam: DirectLiNGAM (Shimizu et al., 2011) -- linear reference baseline

causal-learn/bench/custom_algorithm.py [EDITABLE — lines 3–14 only]

     1: import numpy as np
     2: 
     3: # =====================================================================
     4: # EDITABLE: implement run_causal_discovery below
     5: # =====================================================================
     6: def run_causal_discovery(X: np.ndarray) -> np.ndarray:
     7:     """
     8:     Input:  X of shape (n_samples, n_variables)
     9:     Output: adjacency matrix B of shape (n_variables, n_variables)
    10:             B[i, j] != 0  means j -> i  (follows causal-learn convention)
    11:     """
    12:     n = X.shape[1]
    13:     return np.zeros((n, n))
    14: # =====================================================================

cam baseline — editable region [READ-ONLY — reference implementation]

Lines 3–14:
     1: import numpy as n

AssistantStep 1edit

Tool Input

1{
2  "op": "replace",
3  "filename": "causal-learn/bench/custom_algorithm.py",
4  "start_line": 3,
5  "end_line": 14,
6  "content": "# ANM-GBR-HSIC: Pairwise ANM orientation with GBR + HSIC for causal ordering,\n# then importance-based parent selection with adaptive thresholding.\n# Novel contributions:\n# 1. GBR for flexible nonlinear regression in ANM framework (handles MLP/GP/sigmoid/mixed)\n# 2. HSIC-based pairwise orientation with weighted tournament ranking\n# 3. Adaptive importance threshold for parent selection\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n    \"\"\"\n    Input:  X of shape (n_samples, n_variables)\n    Output: adjacency matrix B of shape (n_variables, n_variables)\n            B[i, j] != 0  means j -> i  (follows causal-learn convention)\n    \"\"\"\n    from sklearn.ensemble import GradientBoostingRegressor\n    from sklearn.preprocessing import StandardScaler\n    import warnings\n    warnings.filterwarnings('ignore')\n\n    n_samples, n_vars = X.shape\n    if n_vars <= 1:\n        return np.zeros((n_vars, n_vars))\n\n    X = StandardScaler().fit_transform(X)\n\n    # Adaptive configuration\n    hsic_n = min(n_samples, 500)\n    n_trees = min(80, max(20, n_samples // 12))\n    min_leaf = max(5, n_samples // 40)\n\n    def hsic_score(x, y):\n        \"\"\"HSIC statistic with RBF kernel and median bandwidth heuristic.\"\"\"\n        rng = np.random.RandomState(42)\n        n = len(x)\n        if n > hsic_n:\n            idx = rng.choice(n, hsic_n, replace=False)\n            x, y = x[idx], y[idx]\n            n = hsic_n\n        x, y = x.ravel(), y.ravel()\n        dx2 = (x[:, None] - x[None, :]) ** 2\n        dy2 = (y[:, None] - y[None, :]) ** 2\n        sx = max(np.median(dx2), 1e-8)\n        sy = max(np.median(dy2), 1e-8)\n        K = np.exp(-dx2 / (2 * sx))\n        L = np.exp(-dy2 / (2 * sy))\n        Kc = K - K.mean(0, keepdims=True) - K.mean(1, keepdims=True) + K.mean()\n        Lc = L - L.mean(0, keepdims=True) - L.mean(1, keepdims=True) + L.mean()\n        return float((Kc * Lc).sum()) / (n * n)\n\n    def make_gbr():\n        return GradientBoostingRegressor(\n            n_estimators=n_trees, max_depth=3, learning_rate=0.1,\n            subsample=0.8, min_samples_leaf=min_leaf, random_state=0)\n\n    # === Step 1: Pairwise ANM Orientation ===\n    # For each pair (i,j), fit both directions and compare residual independence\n    wins = np.zeros(n_vars)\n    for i in range(n_vars):\n        for j in range(i + 1, n_vars):\n            # Direction j -> i: fit X_i = f(X_j) + e\n            g = make_gbr(); g.fit(X[:, j:j+1], X[:, i])\n            h_ji = hsic_score(X[:, i] - g.predict(X[:, j:j+1]), X[:, j])\n\n            # Direction i -> j: fit X_j = g(X_i) + e\n            g = make_gbr(); g.fit(X[:, i:i+1], X[:, j])\n            h_ij = hsic_score(X[:, j] - g.predict(X[:, i:i+1]), X[:, i])\n\n            # Lower HSIC = more independent residuals = correct direction\n            if h_ji < h_ij:\n                wins[j] += 1  # j -> i, so j is more root-like\n            else:\n                wins[i] += 1  # i -> j, so i is more root-like\n\n    # Tournament ranking: most wins = most root-like = earliest in ordering\n    causal_order = list(np.argsort(-wins))\n\n    # === Step 2: Parent Selection with Importance-based Pruning ===\n    B = np.zeros((n_vars, n_vars))\n    for idx in range(1, n_vars):\n        target = causal_order[idx]\n        cands = causal_order[:idx]  # predecessors in causal order\n\n        Xp = X[:, cands]\n        ss_tot = np.sum((X[:, target] - X[:, target].mean()) ** 2)\n        if ss_tot < 1e-10:\n            continue\n\n        g = make_gbr(); g.fit(Xp, X[:, target])\n        yhat = g.predict(Xp)\n        r2 = 1 - np.sum((X[:, target] - yhat) ** 2) / ss_tot\n\n        if r2 < 0.03:  # no meaningful relationship\n            continue\n\n        # Adaptive importance threshold\n        imp = g.feature_importances_\n        threshold = max(0.05, 1.0 / (len(cands) + 1))\n\n        for k, p in enumerate(cands):\n            if imp[k] > threshold:\n                B[target, p] = 1.0\n\n    return B\n"
7}

Tool ResultStep 1

OK: Replaced lines 3..14 with 101 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–103.

[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–103 | total: 103 lines]
     1| import numpy as np
     2| 
     3  # ANM-GBR-HSIC: Pairwise ANM orientation with GBR + HSIC for causal ordering,
     4  # then importance-based parent selection with adaptive thresholding.
     5  # Novel contributions:
     6  # 1. GBR for flexible nonlinear regression in ANM framework (handles MLP/GP/sigmoid/mixed)
     7  # 2. HSIC-based pairwise orientation with weighted tournament ranking
     8  # 3. Adaptive importance threshold for parent selection
     9  def run_causal_discovery(X: np.ndarray) -> np.ndarray:
    10      """
    11      Input:  X of shape (n_samples, n_variables)
    12      Output: adjacency matrix B of shape (n_variables, n_variables)
    13              B[i, j] != 0  means j -> i  (follows causal-learn convention)
    14      """
    15      from skl

AssistantStep 2test

Tool ResultStep 2

[Test #1] (2 tests remaining; call submit(n=1) to submit this result)

### ER8-MLP (scripts/eval_er8_mlp.sh)
Results (ER8-MLP):
  SHD=8  F1=0.4444  Precision=0.3636  Recall=0.5714

### ER12-GP (scripts/eval_er12_gp.sh)
Results (ER12-GP):
  SHD=13  F1=0.7083  Precision=0.5862  Recall=0.8947

### SF10-Mixed (scripts/eval_sf10_mixed.sh)
Results (SF10-Mixed):
  SHD=16  F1=0.4000  Precision=0.4286  Recall=0.3750

### ER15-Sigmoid (scripts/eval_er15_sigmoid.sh)
Results (ER15-Sigmoid):
  SHD=20  F1=0.6769  Precision=0.5641  Recall=0.8462

### ER20-Mixed (scripts/eval_er20_mixed.sh)
Results (ER20-Mixed):
  SHD=57  F1=0.2913  Precision=0.3333  Recall=0.2586

### SF20-GP (scripts/eval_sf20_gp.sh)
Results (SF20-GP):
  SHD=16  F1=0.7200  Precision=0.6923  Recall=0.7500

### ER12-LowSample (scripts/eval_er12_lowsample.sh)
Results (ER12-LowSample):
  SHD=24  F1=0.2857  Precision=0.2609  Recall=0.3158

[Leaderboard] Results saved: {'shd_ER8-MLP': 8, 'f1_ER8-MLP': 0.4444, 'precision_ER8-MLP': 0.3636,

AssistantStep 3edit

Tool Input

1{
2  "op": "replace",
3  "filename": "causal-learn/bench/custom_algorithm.py",
4  "start_line": 3,
5  "end_line": 103,
6  "content": "# ANM-GBR-HSIC v2: Improved causal ordering + permutation-importance parent selection\n# Novel contributions:\n# 1. HSIC-sum root scoring for robust causal ordering (not binary tournament)\n# 2. Pairwise R^2 screening to reduce candidate set before multivariate fitting\n# 3. Held-out permutation importance for principled parent selection\n# 4. Dual-criterion edge inclusion: requires both pairwise evidence AND multivariate contribution\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n    \"\"\"\n    Input:  X of shape (n_samples, n_variables)\n    Output: adjacency matrix B of shape (n_variables, n_variables)\n            B[i, j] != 0  means j -> i  (follows causal-learn convention)\n    \"\"\"\n    from sklearn.ensemble import GradientBoostingRegressor\n    from sklearn.preprocessing import StandardScaler\n    import warnings\n    warnings.filterwarnings('ignore')\n\n    n_samples, n_vars = X.shape\n    if n_vars <= 1:\n        return np.zeros((n_vars, n_vars))\n\n    X = StandardScaler().fit_transform(X)\n\n    # Adaptive configuration\n    hsic_n = min(n_samples, 500)\n    n_trees = min(80, max(20, n_samples // 12))\n    min_leaf = max(5, n_samples // 40)\n\n    def hsic_score(x, y):\n        \"\"\"HSIC with RBF kernel and median bandwidth heuristic.\"\"\"\n        rng = np.random.RandomState(42)\n        n = len(x)\n        if n > hsic_n:\n            idx = rng.choice(n, hsic_n, replace=False)\n            x, y = x[idx], y[idx]\n            n = hsic_n\n        x, y = x.ravel(), y.ravel()\n        dx2 = (x[:, None] - x[None, :]) ** 2\n        dy2 = (y[:, None] - y[None, :]) ** 2\n        sx = max(np.median(dx2), 1e-8)\n        sy = max(np.median(dy2), 1e-8)\n        K = np.exp(-dx2 / (2 * sx))\n        L = np.exp(-dy2 / (2 * sy))\n        Kc = K - K.mean(0, keepdims=True) - K.mean(1, keepdims=True) + K.mean()\n        Lc = L - L.mean(0, keepdims=True) - L.mean(1, keepdims=True) + L.mean()\n        return float((Kc * Lc).sum()) / (n * n)\n\n    def make_gbr():\n        return GradientBoostingRegressor(\n            n_estimators=n_trees, max_depth=3, learning_rate=0.1,\n            subsample=0.8, min_samples_leaf=min_leaf, random_state=0)\n\n    # === Step 1: Pairwise ANM tests — store HSIC and R^2 ===\n    h_cause = np.zeros((n_vars, n_vars))   # h_cause[c,e] = HSIC when c causes e\n    r2_cause = np.zeros((n_vars, n_vars))  # r2_cause[c,e] = training R^2 when c causes e\n    ss = np.array([np.sum((X[:, i] - X[:, i].mean()) ** 2) + 1e-10\n                    for i in range(n_vars)])\n\n    for i in range(n_vars):\n        for j in range(i + 1, n_vars):\n            # Direction j -> i\n            g = make_gbr(); g.fit(X[:, j:j+1], X[:, i])\n            p = g.predict(X[:, j:j+1])\n            h_cause[j, i] = hsic_score(X[:, i] - p, X[:, j])\n            r2_cause[j, i] = max(0, 1 - np.sum((X[:, i] - p) ** 2) / ss[i])\n\n            # Direction i -> j\n            g = make_gbr(); g.fit(X[:, i:i+1], X[:, j])\n            p = g.predict(X[:, i:i+1])\n            h_cause[i, j] = hsic_score(X[:, j] - p, X[:, i])\n            r2_cause[i, j] = max(0, 1 - np.sum((X[:, j] - p) ** 2) / ss[j])\n\n    # Root score: sum of HSIC when variable acts as cause — lower = more root-like\n    root_score = h_cause.sum(axis=1)\n    order = list(np.argsort(root_score))\n    pos = {v: i for i, v in enumerate(order)}\n\n    # === Step 2: Parent selection with screening + permutation importance ===\n    B = np.zeros((n_vars, n_vars))\n\n    for target in range(n_vars):\n        # Screen: predecessors with pairwise R^2 evidence\n        cands = []\n        for p in range(n_vars):\n            if p == target or pos[p] >= pos[target]:\n                continue\n            if max(r2_cause[p, target], r2_cause[target, p]) > 0.01:\n                cands.append(p)\n\n        if not cands:\n            continue\n\n        if len(cands) == 1:\n            if r2_cause[cands[0], target] > 0.03:\n                B[target, cands[0]] = 1.0\n            continue\n\n        # Permutation importance on held-out data (3 random splits)\n        n_splits = 3\n        perm_imp = np.zeros(len(cands))\n        valid = 0\n\n        for s in range(n_splits):\n            rng_s = np.random.RandomState(s * 100)\n            idx = rng_s.permutation(n_samples)\n            nt = int(0.75 * n_samples) if n_samples >= 300 else int(0.8 * n_samples)\n            nt = max(min(nt, n_samples - 15), n_samples // 2)\n            ti, vi = idx[:nt], idx[nt:]\n\n            if len(vi) < 10:\n                continue\n\n            g = make_gbr()\n            g.fit(X[ti][:, cands], X[ti, target])\n            yh = g.predict(X[vi][:, cands])\n            bm = np.mean((X[vi, target] - yh) ** 2)\n            vv = np.var(X[vi, target])\n\n            if bm < 1e-10 or vv < 1e-10 or bm > vv * 0.97:\n                continue\n\n            valid += 1\n            for k in range(len(cands)):\n                Xp = X[vi][:, cands].copy()\n                np.random.RandomState(s * 1000 + k).shuffle(Xp[:, k])\n                pm = np.mean((X[vi, target] - g.predict(Xp)) ** 2)\n                perm_imp[k] += (pm - bm) / (bm + 1e-10)\n\n        if valid > 0:\n            perm_imp /= valid\n            for k, p in enumerate(cands):\n                if perm_imp[k] > 0.03:\n                    B[target, p] = 1.0\n        else:\n            # Fallback: feature importance with conservative threshold\n            g = make_gbr()\n            g.fit(X[:, cands], X[:, target])\n            yh = g.predict(X[:, cands])\n            r2 = 1 - np.sum((X[:, target] - yh) ** 2) / ss[target]\n            if r2 > 0.03:\n                imp = g.feature_importances_\n                th = max(0.08, 1.0 / (len(cands) + 1))\n                for k, p in enumerate(cands):\n                    if imp[k] > th:\n                        B[target, p] = 1.0\n\n    return B\n"
7}

Tool ResultStep 3

OK: Replaced lines 3..103 with 147 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–149.

[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–149 | total: 149 lines]
     1| import numpy as np
     2| 
     3  # ANM-GBR-HSIC v2: Improved causal ordering + permutation-importance parent selection
     4  # Novel contributions:
     5  # 1. HSIC-sum root scoring for robust causal ordering (not binary tournament)
     6  # 2. Pairwise R^2 screening to reduce candidate set before multivariate fitting
     7  # 3. Held-out permutation importance for principled parent selection
     8  # 4. Dual-criterion edge inclusion: requires both pairwise evidence AND multivariate contribution
     9  def run_causal_discovery(X: np.ndarray) -> np.ndarray:
    10      """
    11      Input:  X of shape (n_samples, n_variables)
    12      Output: adjacency matrix B of shape (n_variables, n_variables)
    13              B[i, j] != 0  means j -> i  (follows causal-learn

AssistantStep 4test

Tool ResultStep 4

[Test #2] (1 test remaining; call submit(n=2) to submit this result)

### ER8-MLP (scripts/eval_er8_mlp.sh)
Results (ER8-MLP):
  SHD=6  F1=0.7000  Precision=0.5385  Recall=1.0000

### ER12-GP (scripts/eval_er12_gp.sh)
Results (ER12-GP):
  SHD=21  F1=0.4898  Precision=0.4000  Recall=0.6316

### SF10-Mixed (scripts/eval_sf10_mixed.sh)
Results (SF10-Mixed):
  SHD=18  F1=0.4000  Precision=0.3333  Recall=0.5000

### ER15-Sigmoid (scripts/eval_er15_sigmoid.sh)
Results (ER15-Sigmoid):
  SHD=25  F1=0.6216  Precision=0.4792  Recall=0.8846

### ER20-Mixed (scripts/eval_er20_mixed.sh)
Results (ER20-Mixed):
  SHD=64  F1=0.3520  Precision=0.3284  Recall=0.3793

### SF20-GP (scripts/eval_sf20_gp.sh)
Results (SF20-GP):
  SHD=20  F1=0.6329  Precision=0.5814  Recall=0.6944

### ER12-LowSample (scripts/eval_er12_lowsample.sh)
Results (ER12-LowSample):
  SHD=33  F1=0.3509  Precision=0.2632  Recall=0.5263

[Leaderboard] Results saved: {'shd_ER8-MLP': 6, 'f1_ER8-MLP': 0.7, 'precision_ER8-MLP': 0.5385, 'rec

AssistantStep 5edit

Tool Input

1{
2  "op": "replace",
3  "filename": "causal-learn/bench/custom_algorithm.py",
4  "start_line": 3,
5  "end_line": 149,
6  "content": "# ANM-GBR-HSIC v3: Binary tournament ordering + R^2-screened parent selection\n# Novel contributions:\n# 1. Pairwise ANM tests with GBR + HSIC for flexible nonlinear causal orientation\n# 2. Pairwise R^2 pre-screening to focus multivariate fits on plausible parents\n# 3. Adaptive importance thresholding with screening-aware calibration\n# 4. Dual-use of pairwise fits: both orientation (HSIC) and screening (R^2)\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n    \"\"\"\n    Input:  X of shape (n_samples, n_variables)\n    Output: adjacency matrix B of shape (n_variables, n_variables)\n            B[i, j] != 0  means j -> i  (follows causal-learn convention)\n    \"\"\"\n    from sklearn.ensemble import GradientBoostingRegressor\n    from sklearn.preprocessing import StandardScaler\n    import warnings\n    warnings.filterwarnings('ignore')\n\n    n_samples, n_vars = X.shape\n    if n_vars <= 1:\n        return np.zeros((n_vars, n_vars))\n\n    X = StandardScaler().fit_transform(X)\n\n    # Adaptive configuration\n    hsic_n = min(n_samples, 500)\n    n_trees = min(80, max(20, n_samples // 12))\n    min_leaf = max(5, n_samples // 40)\n\n    def hsic_score(x, y):\n        \"\"\"HSIC with RBF kernel and median bandwidth heuristic.\"\"\"\n        rng = np.random.RandomState(42)\n        n = len(x)\n        if n > hsic_n:\n            idx = rng.choice(n, hsic_n, replace=False)\n            x, y = x[idx], y[idx]\n            n = hsic_n\n        x, y = x.ravel(), y.ravel()\n        dx2 = (x[:, None] - x[None, :]) ** 2\n        dy2 = (y[:, None] - y[None, :]) ** 2\n        sx = max(np.median(dx2), 1e-8)\n        sy = max(np.median(dy2), 1e-8)\n        K = np.exp(-dx2 / (2 * sx))\n        L = np.exp(-dy2 / (2 * sy))\n        Kc = K - K.mean(0, keepdims=True) - K.mean(1, keepdims=True) + K.mean()\n        Lc = L - L.mean(0, keepdims=True) - L.mean(1, keepdims=True) + L.mean()\n        return float((Kc * Lc).sum()) / (n * n)\n\n    def make_gbr():\n        return GradientBoostingRegressor(\n            n_estimators=n_trees, max_depth=3, learning_rate=0.1,\n            subsample=0.8, min_samples_leaf=min_leaf, random_state=0)\n\n    # === Step 1: Pairwise ANM tests — orientation + R^2 for screening ===\n    wins = np.zeros(n_vars)\n    r2_pair = np.zeros((n_vars, n_vars))  # r2_pair[c,e] = R^2 when c causes e\n    ss = np.array([np.sum((X[:, i] - X[:, i].mean()) ** 2) + 1e-10\n                    for i in range(n_vars)])\n\n    for i in range(n_vars):\n        for j in range(i + 1, n_vars):\n            # Direction j -> i\n            g = make_gbr(); g.fit(X[:, j:j+1], X[:, i])\n            pred = g.predict(X[:, j:j+1])\n            h_ji = hsic_score(X[:, i] - pred, X[:, j])\n            r2_pair[j, i] = max(0, 1 - np.sum((X[:, i] - pred) ** 2) / ss[i])\n\n            # Direction i -> j\n            g = make_gbr(); g.fit(X[:, i:i+1], X[:, j])\n            pred = g.predict(X[:, i:i+1])\n            h_ij = hsic_score(X[:, j] - pred, X[:, i])\n            r2_pair[i, j] = max(0, 1 - np.sum((X[:, j] - pred) ** 2) / ss[j])\n\n            # Lower HSIC = more independent residuals = correct direction\n            if h_ji < h_ij:\n                wins[j] += 1  # j -> i\n            else:\n                wins[i] += 1  # i -> j\n\n    # Binary tournament ranking (most wins = most root-like = first)\n    causal_order = list(np.argsort(-wins))\n\n    # === Step 2: Screened parent selection with importance-based pruning ===\n    B = np.zeros((n_vars, n_vars))\n    order_pos = {v: i for i, v in enumerate(causal_order)}\n\n    for idx in range(1, n_vars):\n        target = causal_order[idx]\n        all_preds = causal_order[:idx]\n\n        # Screen: only predecessors with pairwise R^2 evidence\n        cands = []\n        for p in all_preds:\n            if max(r2_pair[p, target], r2_pair[target, p]) > 0.01:\n                cands.append(p)\n\n        if not cands:\n            continue\n\n        if len(cands) == 1:\n            # Single candidate: require moderate R^2\n            if r2_pair[cands[0], target] > 0.04:\n                B[target, cands[0]] = 1.0\n            continue\n\n        # Fit GBR with screened candidates\n        Xp = X[:, cands]\n        g = make_gbr(); g.fit(Xp, X[:, target])\n        yhat = g.predict(Xp)\n        r2 = 1 - np.sum((X[:, target] - yhat) ** 2) / ss[target]\n\n        if r2 < 0.03:\n            continue\n\n        # Feature importance with screening-aware adaptive threshold\n        imp = g.feature_importances_\n        threshold = max(0.07, 1.2 / (len(cands) + 1))\n\n        for k, p in enumerate(cands):\n            if imp[k] > threshold:\n                B[target, p] = 1.0\n\n    return B\n"
7}

Tool ResultStep 5

OK: Replaced lines 3..149 with 122 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–124.

[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–124 | total: 124 lines]
     1| import numpy as np
     2| 
     3  # ANM-GBR-HSIC v3: Binary tournament ordering + R^2-screened parent selection
     4  # Novel contributions:
     5  # 1. Pairwise ANM tests with GBR + HSIC for flexible nonlinear causal orientation
     6  # 2. Pairwise R^2 pre-screening to focus multivariate fits on plausible parents
     7  # 3. Adaptive importance thresholding with screening-aware calibration
     8  # 4. Dual-use of pairwise fits: both orientation (HSIC) and screening (R^2)
     9  def run_causal_discovery(X: np.ndarray) -> np.ndarray:
    10      """
    11      Input:  X of shape (n_samples, n_variables)
    12      Output: adjacency matrix B of shape (n_variables, n_variables)
    13              B[i, j] != 0  means j -> i  (follows causal-learn convention)
    14

AssistantStep 6test

Tool ResultStep 6

[Test #3] (0 tests remaining)

### ER8-MLP (scripts/eval_er8_mlp.sh)
Results (ER8-MLP):
  SHD=9  F1=0.3529  Precision=0.3000  Recall=0.4286

### ER12-GP (scripts/eval_er12_gp.sh)
Results (ER12-GP):
  SHD=9  F1=0.7500  Precision=0.7143  Recall=0.7895

### SF10-Mixed (scripts/eval_sf10_mixed.sh)
Results (SF10-Mixed):
  SHD=16  F1=0.3571  Precision=0.4167  Recall=0.3125

### ER15-Sigmoid (scripts/eval_er15_sigmoid.sh)
Results (ER15-Sigmoid):
  SHD=20  F1=0.6441  Precision=0.5758  Recall=0.7308

### ER20-Mixed (scripts/eval_er20_mixed.sh)
Results (ER20-Mixed):
  SHD=58  F1=0.2474  Precision=0.3077  Recall=0.2069

### SF20-GP (scripts/eval_sf20_gp.sh)
Results (SF20-GP):
  SHD=13  F1=0.7429  Precision=0.7647  Recall=0.7222

### ER12-LowSample (scripts/eval_er12_lowsample.sh)
Results (ER12-LowSample):
  SHD=24  F1=0.2162  Precision=0.2222  Recall=0.2105

### ER20-Gauss (scripts/eval_er20_gauss.sh)
Results (ER20-Gauss):
  SHD=50  F1=0.3600  Precision=0.4286  Recall=0.3103

[Leaderboard] Result