Agent Conversation: anthropic/claude-opus-4.6

Model: anthropic/claude-opus-4.6
Total steps: 6
Messages: 13
UserStep 0

Task: causal-observational-nonlinear

Causal Discovery: Observational Nonlinear Data

Objective

Implement a causal discovery algorithm that recovers the DAG structure from purely observational data generated by a nonlinear Additive Noise Model (ANM). Your code goes in bench/custom_algorithm.py.

Background

Additive Noise Models (ANMs) assume structural equations of the form x_j = f_j(parents(j)) + e_j, where f_j are nonlinear functions and e_j are independent noise terms. Under mild conditions on f_j and the noise distribution, the causal DAG is identifiable from observational data alone (Hoyer et al., 2008; Peters et al., 2014). This goes beyond linear methods (LiNGAM, PC, GES) that assume linearity.

Key challenges include: (1) testing independence in high-dimensional nonlinear settings, (2) scaling to larger graphs, (3) handling diverse nonlinear function types (MLP, GP, polynomial, sigmoid), and (4) robustness to different noise distributions including the harder Gaussian noise case.

Evaluation Scenarios

LabelGraph typeNodesSamplesNoiseNonlinearity
ER8-MLPErdos-Renyi8500ExponentialMLP
ER12-GPErdos-Renyi121000LaplaceGP
SF10-MixedScale-Free (BA)10500UniformMixed
ER15-SigmoidErdos-Renyi151000ExponentialSigmoid
ER20-MixedErdos-Renyi202000LaplaceMixed
SF20-GPScale-Free (BA)202000ExponentialGP
ER12-LowSampleErdos-Renyi12150LaplaceMixed
ER20-GaussErdos-Renyi202000GaussianMixed

Metrics

All computed on the directed edge set (skeleton + direction must be correct):

  • F1 (primary ranking metric), SHD, Precision, Recall

Baselines

  • cam: Causal Additive Models (Buehlmann et al., 2014) -- score-based with GAM regression
  • notears_mlp: NOTEARS with MLP parametrization (Zheng et al., 2020) -- continuous optimization
  • directlingam: DirectLiNGAM (Shimizu et al., 2011) -- linear reference baseline

causal-learn/bench/custom_algorithm.py [EDITABLE — lines 3–14 only]

     1: import numpy as np
     2: 
     3: # =====================================================================
     4: # EDITABLE: implement run_causal_discovery below
     5: # =====================================================================
     6: def run_causal_discovery(X: np.ndarray) -> np.ndarray:
     7:     """
     8:     Input:  X of shape (n_samples, n_variables)
     9:     Output: adjacency matrix B of shape (n_variables, n_variables)
    10:             B[i, j] != 0  means j -> i  (follows causal-learn convention)
    11:     """
    12:     n = X.shape[1]
    13:     return np.zeros((n, n))
    14: # =====================================================================

cam baseline — editable region [READ-ONLY — reference implementation]

Lines 314:
     1: import numpy as n
AssistantStep 1edit
Tool Input
1{
2  "op": "replace",
3  "filename": "causal-learn/bench/custom_algorithm.py",
4  "start_line": 3,
5  "end_line": 14,
6  "content": "# ANM-GBR-HSIC: Pairwise ANM orientation with GBR + HSIC for causal ordering,\n# then importance-based parent selection with adaptive thresholding.\n# Novel contributions:\n# 1. GBR for flexible nonlinear regression in ANM framework (handles MLP/GP/sigmoid/mixed)\n# 2. HSIC-based pairwise orientation with weighted tournament ranking\n# 3. Adaptive importance threshold for parent selection\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n    \"\"\"\n    Input:  X of shape (n_samples, n_variables)\n    Output: adjacency matrix B of shape (n_variables, n_variables)\n            B[i, j] != 0  means j -> i  (follows causal-learn convention)\n    \"\"\"\n    from sklearn.ensemble import GradientBoostingRegressor\n    from sklearn.preprocessing import StandardScaler\n    import warnings\n    warnings.filterwarnings('ignore')\n\n    n_samples, n_vars = X.shape\n    if n_vars <= 1:\n        return np.zeros((n_vars, n_vars))\n\n    X = StandardScaler().fit_transform(X)\n\n    # Adaptive configuration\n    hsic_n = min(n_samples, 500)\n    n_trees = min(80, max(20, n_samples // 12))\n    min_leaf = max(5, n_samples // 40)\n\n    def hsic_score(x, y):\n        \"\"\"HSIC statistic with RBF kernel and median bandwidth heuristic.\"\"\"\n        rng = np.random.RandomState(42)\n        n = len(x)\n        if n > hsic_n:\n            idx = rng.choice(n, hsic_n, replace=False)\n            x, y = x[idx], y[idx]\n            n = hsic_n\n        x, y = x.ravel(), y.ravel()\n        dx2 = (x[:, None] - x[None, :]) ** 2\n        dy2 = (y[:, None] - y[None, :]) ** 2\n        sx = max(np.median(dx2), 1e-8)\n        sy = max(np.median(dy2), 1e-8)\n        K = np.exp(-dx2 / (2 * sx))\n        L = np.exp(-dy2 / (2 * sy))\n        Kc = K - K.mean(0, keepdims=True) - K.mean(1, keepdims=True) + K.mean()\n        Lc = L - L.mean(0, keepdims=True) - L.mean(1, keepdims=True) + L.mean()\n        return float((Kc * Lc).sum()) / (n * n)\n\n    def make_gbr():\n        return GradientBoostingRegressor(\n            n_estimators=n_trees, max_depth=3, learning_rate=0.1,\n            subsample=0.8, min_samples_leaf=min_leaf, random_state=0)\n\n    # === Step 1: Pairwise ANM Orientation ===\n    # For each pair (i,j), fit both directions and compare residual independence\n    wins = np.zeros(n_vars)\n    for i in range(n_vars):\n        for j in range(i + 1, n_vars):\n            # Direction j -> i: fit X_i = f(X_j) + e\n            g = make_gbr(); g.fit(X[:, j:j+1], X[:, i])\n            h_ji = hsic_score(X[:, i] - g.predict(X[:, j:j+1]), X[:, j])\n\n            # Direction i -> j: fit X_j = g(X_i) + e\n            g = make_gbr(); g.fit(X[:, i:i+1], X[:, j])\n            h_ij = hsic_score(X[:, j] - g.predict(X[:, i:i+1]), X[:, i])\n\n            # Lower HSIC = more independent residuals = correct direction\n            if h_ji < h_ij:\n                wins[j] += 1  # j -> i, so j is more root-like\n            else:\n                wins[i] += 1  # i -> j, so i is more root-like\n\n    # Tournament ranking: most wins = most root-like = earliest in ordering\n    causal_order = list(np.argsort(-wins))\n\n    # === Step 2: Parent Selection with Importance-based Pruning ===\n    B = np.zeros((n_vars, n_vars))\n    for idx in range(1, n_vars):\n        target = causal_order[idx]\n        cands = causal_order[:idx]  # predecessors in causal order\n\n        Xp = X[:, cands]\n        ss_tot = np.sum((X[:, target] - X[:, target].mean()) ** 2)\n        if ss_tot < 1e-10:\n            continue\n\n        g = make_gbr(); g.fit(Xp, X[:, target])\n        yhat = g.predict(Xp)\n        r2 = 1 - np.sum((X[:, target] - yhat) ** 2) / ss_tot\n\n        if r2 < 0.03:  # no meaningful relationship\n            continue\n\n        # Adaptive importance threshold\n        imp = g.feature_importances_\n        threshold = max(0.05, 1.0 / (len(cands) + 1))\n\n        for k, p in enumerate(cands):\n            if imp[k] > threshold:\n                B[target, p] = 1.0\n\n    return B\n"
7}
Tool ResultStep 1
OK: Replaced lines 3..14 with 101 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–103.

[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–103 | total: 103 lines]
     1| import numpy as np
     2| 
     3  # ANM-GBR-HSIC: Pairwise ANM orientation with GBR + HSIC for causal ordering,
     4  # then importance-based parent selection with adaptive thresholding.
     5  # Novel contributions:
     6  # 1. GBR for flexible nonlinear regression in ANM framework (handles MLP/GP/sigmoid/mixed)
     7  # 2. HSIC-based pairwise orientation with weighted tournament ranking
     8  # 3. Adaptive importance threshold for parent selection
     9  def run_causal_discovery(X: np.ndarray) -> np.ndarray:
    10      """
    11      Input:  X of shape (n_samples, n_variables)
    12      Output: adjacency matrix B of shape (n_variables, n_variables)
    13              B[i, j] != 0  means j -> i  (follows causal-learn convention)
    14      """
    15      from skl
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)

### ER8-MLP (scripts/eval_er8_mlp.sh)
Results (ER8-MLP):
  SHD=8  F1=0.4444  Precision=0.3636  Recall=0.5714

### ER12-GP (scripts/eval_er12_gp.sh)
Results (ER12-GP):
  SHD=13  F1=0.7083  Precision=0.5862  Recall=0.8947

### SF10-Mixed (scripts/eval_sf10_mixed.sh)
Results (SF10-Mixed):
  SHD=16  F1=0.4000  Precision=0.4286  Recall=0.3750

### ER15-Sigmoid (scripts/eval_er15_sigmoid.sh)
Results (ER15-Sigmoid):
  SHD=20  F1=0.6769  Precision=0.5641  Recall=0.8462

### ER20-Mixed (scripts/eval_er20_mixed.sh)
Results (ER20-Mixed):
  SHD=57  F1=0.2913  Precision=0.3333  Recall=0.2586

### SF20-GP (scripts/eval_sf20_gp.sh)
Results (SF20-GP):
  SHD=16  F1=0.7200  Precision=0.6923  Recall=0.7500

### ER12-LowSample (scripts/eval_er12_lowsample.sh)
Results (ER12-LowSample):
  SHD=24  F1=0.2857  Precision=0.2609  Recall=0.3158

[Leaderboard] Results saved: {'shd_ER8-MLP': 8, 'f1_ER8-MLP': 0.4444, 'precision_ER8-MLP': 0.3636, 
AssistantStep 3edit
Tool Input
1{
2  "op": "replace",
3  "filename": "causal-learn/bench/custom_algorithm.py",
4  "start_line": 3,
5  "end_line": 103,
6  "content": "# ANM-GBR-HSIC v2: Improved causal ordering + permutation-importance parent selection\n# Novel contributions:\n# 1. HSIC-sum root scoring for robust causal ordering (not binary tournament)\n# 2. Pairwise R^2 screening to reduce candidate set before multivariate fitting\n# 3. Held-out permutation importance for principled parent selection\n# 4. Dual-criterion edge inclusion: requires both pairwise evidence AND multivariate contribution\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n    \"\"\"\n    Input:  X of shape (n_samples, n_variables)\n    Output: adjacency matrix B of shape (n_variables, n_variables)\n            B[i, j] != 0  means j -> i  (follows causal-learn convention)\n    \"\"\"\n    from sklearn.ensemble import GradientBoostingRegressor\n    from sklearn.preprocessing import StandardScaler\n    import warnings\n    warnings.filterwarnings('ignore')\n\n    n_samples, n_vars = X.shape\n    if n_vars <= 1:\n        return np.zeros((n_vars, n_vars))\n\n    X = StandardScaler().fit_transform(X)\n\n    # Adaptive configuration\n    hsic_n = min(n_samples, 500)\n    n_trees = min(80, max(20, n_samples // 12))\n    min_leaf = max(5, n_samples // 40)\n\n    def hsic_score(x, y):\n        \"\"\"HSIC with RBF kernel and median bandwidth heuristic.\"\"\"\n        rng = np.random.RandomState(42)\n        n = len(x)\n        if n > hsic_n:\n            idx = rng.choice(n, hsic_n, replace=False)\n            x, y = x[idx], y[idx]\n            n = hsic_n\n        x, y = x.ravel(), y.ravel()\n        dx2 = (x[:, None] - x[None, :]) ** 2\n        dy2 = (y[:, None] - y[None, :]) ** 2\n        sx = max(np.median(dx2), 1e-8)\n        sy = max(np.median(dy2), 1e-8)\n        K = np.exp(-dx2 / (2 * sx))\n        L = np.exp(-dy2 / (2 * sy))\n        Kc = K - K.mean(0, keepdims=True) - K.mean(1, keepdims=True) + K.mean()\n        Lc = L - L.mean(0, keepdims=True) - L.mean(1, keepdims=True) + L.mean()\n        return float((Kc * Lc).sum()) / (n * n)\n\n    def make_gbr():\n        return GradientBoostingRegressor(\n            n_estimators=n_trees, max_depth=3, learning_rate=0.1,\n            subsample=0.8, min_samples_leaf=min_leaf, random_state=0)\n\n    # === Step 1: Pairwise ANM tests — store HSIC and R^2 ===\n    h_cause = np.zeros((n_vars, n_vars))   # h_cause[c,e] = HSIC when c causes e\n    r2_cause = np.zeros((n_vars, n_vars))  # r2_cause[c,e] = training R^2 when c causes e\n    ss = np.array([np.sum((X[:, i] - X[:, i].mean()) ** 2) + 1e-10\n                    for i in range(n_vars)])\n\n    for i in range(n_vars):\n        for j in range(i + 1, n_vars):\n            # Direction j -> i\n            g = make_gbr(); g.fit(X[:, j:j+1], X[:, i])\n            p = g.predict(X[:, j:j+1])\n            h_cause[j, i] = hsic_score(X[:, i] - p, X[:, j])\n            r2_cause[j, i] = max(0, 1 - np.sum((X[:, i] - p) ** 2) / ss[i])\n\n            # Direction i -> j\n            g = make_gbr(); g.fit(X[:, i:i+1], X[:, j])\n            p = g.predict(X[:, i:i+1])\n            h_cause[i, j] = hsic_score(X[:, j] - p, X[:, i])\n            r2_cause[i, j] = max(0, 1 - np.sum((X[:, j] - p) ** 2) / ss[j])\n\n    # Root score: sum of HSIC when variable acts as cause — lower = more root-like\n    root_score = h_cause.sum(axis=1)\n    order = list(np.argsort(root_score))\n    pos = {v: i for i, v in enumerate(order)}\n\n    # === Step 2: Parent selection with screening + permutation importance ===\n    B = np.zeros((n_vars, n_vars))\n\n    for target in range(n_vars):\n        # Screen: predecessors with pairwise R^2 evidence\n        cands = []\n        for p in range(n_vars):\n            if p == target or pos[p] >= pos[target]:\n                continue\n            if max(r2_cause[p, target], r2_cause[target, p]) > 0.01:\n                cands.append(p)\n\n        if not cands:\n            continue\n\n        if len(cands) == 1:\n            if r2_cause[cands[0], target] > 0.03:\n                B[target, cands[0]] = 1.0\n            continue\n\n        # Permutation importance on held-out data (3 random splits)\n        n_splits = 3\n        perm_imp = np.zeros(len(cands))\n        valid = 0\n\n        for s in range(n_splits):\n            rng_s = np.random.RandomState(s * 100)\n            idx = rng_s.permutation(n_samples)\n            nt = int(0.75 * n_samples) if n_samples >= 300 else int(0.8 * n_samples)\n            nt = max(min(nt, n_samples - 15), n_samples // 2)\n            ti, vi = idx[:nt], idx[nt:]\n\n            if len(vi) < 10:\n                continue\n\n            g = make_gbr()\n            g.fit(X[ti][:, cands], X[ti, target])\n            yh = g.predict(X[vi][:, cands])\n            bm = np.mean((X[vi, target] - yh) ** 2)\n            vv = np.var(X[vi, target])\n\n            if bm < 1e-10 or vv < 1e-10 or bm > vv * 0.97:\n                continue\n\n            valid += 1\n            for k in range(len(cands)):\n                Xp = X[vi][:, cands].copy()\n                np.random.RandomState(s * 1000 + k).shuffle(Xp[:, k])\n                pm = np.mean((X[vi, target] - g.predict(Xp)) ** 2)\n                perm_imp[k] += (pm - bm) / (bm + 1e-10)\n\n        if valid > 0:\n            perm_imp /= valid\n            for k, p in enumerate(cands):\n                if perm_imp[k] > 0.03:\n                    B[target, p] = 1.0\n        else:\n            # Fallback: feature importance with conservative threshold\n            g = make_gbr()\n            g.fit(X[:, cands], X[:, target])\n            yh = g.predict(X[:, cands])\n            r2 = 1 - np.sum((X[:, target] - yh) ** 2) / ss[target]\n            if r2 > 0.03:\n                imp = g.feature_importances_\n                th = max(0.08, 1.0 / (len(cands) + 1))\n                for k, p in enumerate(cands):\n                    if imp[k] > th:\n                        B[target, p] = 1.0\n\n    return B\n"
7}
Tool ResultStep 3
OK: Replaced lines 3..103 with 147 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–149.

[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–149 | total: 149 lines]
     1| import numpy as np
     2| 
     3  # ANM-GBR-HSIC v2: Improved causal ordering + permutation-importance parent selection
     4  # Novel contributions:
     5  # 1. HSIC-sum root scoring for robust causal ordering (not binary tournament)
     6  # 2. Pairwise R^2 screening to reduce candidate set before multivariate fitting
     7  # 3. Held-out permutation importance for principled parent selection
     8  # 4. Dual-criterion edge inclusion: requires both pairwise evidence AND multivariate contribution
     9  def run_causal_discovery(X: np.ndarray) -> np.ndarray:
    10      """
    11      Input:  X of shape (n_samples, n_variables)
    12      Output: adjacency matrix B of shape (n_variables, n_variables)
    13              B[i, j] != 0  means j -> i  (follows causal-learn 
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=2) to submit this result)

### ER8-MLP (scripts/eval_er8_mlp.sh)
Results (ER8-MLP):
  SHD=6  F1=0.7000  Precision=0.5385  Recall=1.0000

### ER12-GP (scripts/eval_er12_gp.sh)
Results (ER12-GP):
  SHD=21  F1=0.4898  Precision=0.4000  Recall=0.6316

### SF10-Mixed (scripts/eval_sf10_mixed.sh)
Results (SF10-Mixed):
  SHD=18  F1=0.4000  Precision=0.3333  Recall=0.5000

### ER15-Sigmoid (scripts/eval_er15_sigmoid.sh)
Results (ER15-Sigmoid):
  SHD=25  F1=0.6216  Precision=0.4792  Recall=0.8846

### ER20-Mixed (scripts/eval_er20_mixed.sh)
Results (ER20-Mixed):
  SHD=64  F1=0.3520  Precision=0.3284  Recall=0.3793

### SF20-GP (scripts/eval_sf20_gp.sh)
Results (SF20-GP):
  SHD=20  F1=0.6329  Precision=0.5814  Recall=0.6944

### ER12-LowSample (scripts/eval_er12_lowsample.sh)
Results (ER12-LowSample):
  SHD=33  F1=0.3509  Precision=0.2632  Recall=0.5263

[Leaderboard] Results saved: {'shd_ER8-MLP': 6, 'f1_ER8-MLP': 0.7, 'precision_ER8-MLP': 0.5385, 'rec
AssistantStep 5edit
Tool Input
1{
2  "op": "replace",
3  "filename": "causal-learn/bench/custom_algorithm.py",
4  "start_line": 3,
5  "end_line": 149,
6  "content": "# ANM-GBR-HSIC v3: Binary tournament ordering + R^2-screened parent selection\n# Novel contributions:\n# 1. Pairwise ANM tests with GBR + HSIC for flexible nonlinear causal orientation\n# 2. Pairwise R^2 pre-screening to focus multivariate fits on plausible parents\n# 3. Adaptive importance thresholding with screening-aware calibration\n# 4. Dual-use of pairwise fits: both orientation (HSIC) and screening (R^2)\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n    \"\"\"\n    Input:  X of shape (n_samples, n_variables)\n    Output: adjacency matrix B of shape (n_variables, n_variables)\n            B[i, j] != 0  means j -> i  (follows causal-learn convention)\n    \"\"\"\n    from sklearn.ensemble import GradientBoostingRegressor\n    from sklearn.preprocessing import StandardScaler\n    import warnings\n    warnings.filterwarnings('ignore')\n\n    n_samples, n_vars = X.shape\n    if n_vars <= 1:\n        return np.zeros((n_vars, n_vars))\n\n    X = StandardScaler().fit_transform(X)\n\n    # Adaptive configuration\n    hsic_n = min(n_samples, 500)\n    n_trees = min(80, max(20, n_samples // 12))\n    min_leaf = max(5, n_samples // 40)\n\n    def hsic_score(x, y):\n        \"\"\"HSIC with RBF kernel and median bandwidth heuristic.\"\"\"\n        rng = np.random.RandomState(42)\n        n = len(x)\n        if n > hsic_n:\n            idx = rng.choice(n, hsic_n, replace=False)\n            x, y = x[idx], y[idx]\n            n = hsic_n\n        x, y = x.ravel(), y.ravel()\n        dx2 = (x[:, None] - x[None, :]) ** 2\n        dy2 = (y[:, None] - y[None, :]) ** 2\n        sx = max(np.median(dx2), 1e-8)\n        sy = max(np.median(dy2), 1e-8)\n        K = np.exp(-dx2 / (2 * sx))\n        L = np.exp(-dy2 / (2 * sy))\n        Kc = K - K.mean(0, keepdims=True) - K.mean(1, keepdims=True) + K.mean()\n        Lc = L - L.mean(0, keepdims=True) - L.mean(1, keepdims=True) + L.mean()\n        return float((Kc * Lc).sum()) / (n * n)\n\n    def make_gbr():\n        return GradientBoostingRegressor(\n            n_estimators=n_trees, max_depth=3, learning_rate=0.1,\n            subsample=0.8, min_samples_leaf=min_leaf, random_state=0)\n\n    # === Step 1: Pairwise ANM tests — orientation + R^2 for screening ===\n    wins = np.zeros(n_vars)\n    r2_pair = np.zeros((n_vars, n_vars))  # r2_pair[c,e] = R^2 when c causes e\n    ss = np.array([np.sum((X[:, i] - X[:, i].mean()) ** 2) + 1e-10\n                    for i in range(n_vars)])\n\n    for i in range(n_vars):\n        for j in range(i + 1, n_vars):\n            # Direction j -> i\n            g = make_gbr(); g.fit(X[:, j:j+1], X[:, i])\n            pred = g.predict(X[:, j:j+1])\n            h_ji = hsic_score(X[:, i] - pred, X[:, j])\n            r2_pair[j, i] = max(0, 1 - np.sum((X[:, i] - pred) ** 2) / ss[i])\n\n            # Direction i -> j\n            g = make_gbr(); g.fit(X[:, i:i+1], X[:, j])\n            pred = g.predict(X[:, i:i+1])\n            h_ij = hsic_score(X[:, j] - pred, X[:, i])\n            r2_pair[i, j] = max(0, 1 - np.sum((X[:, j] - pred) ** 2) / ss[j])\n\n            # Lower HSIC = more independent residuals = correct direction\n            if h_ji < h_ij:\n                wins[j] += 1  # j -> i\n            else:\n                wins[i] += 1  # i -> j\n\n    # Binary tournament ranking (most wins = most root-like = first)\n    causal_order = list(np.argsort(-wins))\n\n    # === Step 2: Screened parent selection with importance-based pruning ===\n    B = np.zeros((n_vars, n_vars))\n    order_pos = {v: i for i, v in enumerate(causal_order)}\n\n    for idx in range(1, n_vars):\n        target = causal_order[idx]\n        all_preds = causal_order[:idx]\n\n        # Screen: only predecessors with pairwise R^2 evidence\n        cands = []\n        for p in all_preds:\n            if max(r2_pair[p, target], r2_pair[target, p]) > 0.01:\n                cands.append(p)\n\n        if not cands:\n            continue\n\n        if len(cands) == 1:\n            # Single candidate: require moderate R^2\n            if r2_pair[cands[0], target] > 0.04:\n                B[target, cands[0]] = 1.0\n            continue\n\n        # Fit GBR with screened candidates\n        Xp = X[:, cands]\n        g = make_gbr(); g.fit(Xp, X[:, target])\n        yhat = g.predict(Xp)\n        r2 = 1 - np.sum((X[:, target] - yhat) ** 2) / ss[target]\n\n        if r2 < 0.03:\n            continue\n\n        # Feature importance with screening-aware adaptive threshold\n        imp = g.feature_importances_\n        threshold = max(0.07, 1.2 / (len(cands) + 1))\n\n        for k, p in enumerate(cands):\n            if imp[k] > threshold:\n                B[target, p] = 1.0\n\n    return B\n"
7}
Tool ResultStep 5
OK: Replaced lines 3..149 with 122 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–124.

[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–124 | total: 124 lines]
     1| import numpy as np
     2| 
     3  # ANM-GBR-HSIC v3: Binary tournament ordering + R^2-screened parent selection
     4  # Novel contributions:
     5  # 1. Pairwise ANM tests with GBR + HSIC for flexible nonlinear causal orientation
     6  # 2. Pairwise R^2 pre-screening to focus multivariate fits on plausible parents
     7  # 3. Adaptive importance thresholding with screening-aware calibration
     8  # 4. Dual-use of pairwise fits: both orientation (HSIC) and screening (R^2)
     9  def run_causal_discovery(X: np.ndarray) -> np.ndarray:
    10      """
    11      Input:  X of shape (n_samples, n_variables)
    12      Output: adjacency matrix B of shape (n_variables, n_variables)
    13              B[i, j] != 0  means j -> i  (follows causal-learn convention)
    14      
AssistantStep 6test
Tool ResultStep 6
[Test #3] (0 tests remaining)

### ER8-MLP (scripts/eval_er8_mlp.sh)
Results (ER8-MLP):
  SHD=9  F1=0.3529  Precision=0.3000  Recall=0.4286

### ER12-GP (scripts/eval_er12_gp.sh)
Results (ER12-GP):
  SHD=9  F1=0.7500  Precision=0.7143  Recall=0.7895

### SF10-Mixed (scripts/eval_sf10_mixed.sh)
Results (SF10-Mixed):
  SHD=16  F1=0.3571  Precision=0.4167  Recall=0.3125

### ER15-Sigmoid (scripts/eval_er15_sigmoid.sh)
Results (ER15-Sigmoid):
  SHD=20  F1=0.6441  Precision=0.5758  Recall=0.7308

### ER20-Mixed (scripts/eval_er20_mixed.sh)
Results (ER20-Mixed):
  SHD=58  F1=0.2474  Precision=0.3077  Recall=0.2069

### SF20-GP (scripts/eval_sf20_gp.sh)
Results (SF20-GP):
  SHD=13  F1=0.7429  Precision=0.7647  Recall=0.7222

### ER12-LowSample (scripts/eval_er12_lowsample.sh)
Results (ER12-LowSample):
  SHD=24  F1=0.2162  Precision=0.2222  Recall=0.2105

### ER20-Gauss (scripts/eval_er20_gauss.sh)
Results (ER20-Gauss):
  SHD=50  F1=0.3600  Precision=0.4286  Recall=0.3103

[Leaderboard] Result