Agent Conversation: google/gemini-3.1-pro-preview

Model: google/gemini-3.1-pro-preview
Total steps: 9
Messages: 24
UserStep 0

Task: causal-observational-linear-gaussian

Causal Discovery: Observational Linear Gaussian Data (CPDAG Recovery)

Objective

Implement a causal discovery algorithm that recovers the CPDAG from purely observational data generated by a linear Gaussian SEM. Your code goes in bench/custom_algorithm.py.

Background

Under linear Gaussian assumptions, observational data can only identify the Markov Equivalence Class (MEC), not a unique fully directed DAG. The MEC is represented by a CPDAG.

Interface

Your run_causal_discovery(X) implementation must return a causallearn.graph.GeneralGraph.GeneralGraph object representing the estimated CPDAG.

Evaluation Scenarios

LabelGraph typeNodesParamsSamplesNoise
ER10Erdos-Renyi10p=0.35001.0
ER20Erdos-Renyi20p=0.210001.0
SF50Scale-Free (BA)50m=220001.0
ER10-HardErdos-Renyi10p=0.5 (denser)2001.0
ER20-HardErdos-Renyi20p=0.35 (denser)4001.0
SF50-HardScale-Free (BA)50m=3 (denser)10001.0
ER10-NoisyErdos-Renyi10p=0.5 (denser)2002.5
ER20-NoisyErdos-Renyi20p=0.35 (denser)4002.5
SF50-NoisyScale-Free (BA)50m=3 (denser)10002.5

Metrics

Metrics are computed between estimated CPDAG and ground-truth CPDAG:

  • SHD (SHD(...).get_shd())
  • Adjacency Precision / Recall (AdjacencyConfusion)
  • Arrow Precision / Recall (ArrowConfusion)

Baselines

  • pc: Peter-Clark (constraint-based)
  • ges: Greedy Equivalence Search (score-based)
  • grasp: Greedy Relaxations of the Sparsest Permutation
  • boss: Best Order Score Search

causal-learn/bench/custom_algorithm.py [EDITABLE — lines 3–14 only]

     1: import numpy as np
     2: from causallearn.graph.GeneralGraph import GeneralGraph
     3: from causallearn.graph.GraphNode import GraphNode
     4: 
     5: # =====================================================================
     6: # EDITABLE: implement run_causal_discovery below
     7: # =====================================================================
     8: def run_causal_discovery(X: np.ndarray) -> GeneralGraph:
     9:     """
    10:     Input:  X of shape (n_samples, n_variables)
    11:     Output: estimated CPDAG as causallearn.graph.GeneralGraph.GeneralGraph
    12:     """
    13:     nodes = [GraphNode(f"X{i + 1}") for i in range(X.shape[1])]
    14:     return GeneralGraph(nodes)
    15: # =====================================================================

pc baseline — editable region [READ-ONLY — reference implementation]

Lines 314:
     1: import numpy as np
     2: from causallearn.graph.GeneralGraph import GeneralGraph
     3: from causallearn.graph.GraphNode import GraphNode
     4: 
     5: # =====================================================================
     6: # EDITABLE: imp
AssistantStep 1edit
Tool Input
1{
2  "filename": "bench/custom_algorithm.py",
3  "start_line": 5,
4  "content": "# =====================================================================\n# EDITABLE: implement run_causal_discovery below\n# =====================================================================\nimport itertools\nimport math\nfrom causallearn.utils.DAG2CPDAG import dag2cpdag\nfrom causallearn.graph.Edge import Edge\nfrom causallearn.graph.Endpoint import Endpoint\nfrom causallearn.utils.PCUtils import SkeletonDiscovery\nfrom causallearn.utils.cit import CIT\n\ndef run_causal_discovery(X: np.ndarray) -> GeneralGraph:\n    n, d = X.shape\n    nodes = [GraphNode(f\"X{i + 1}\") for i in range(d)]\n    C = np.cov(X.T, bias=True)\n    \n    indep_test = CIT(X, \"fisherz\")\n    try:\n        cg = SkeletonDiscovery.skeleton_discovery(X, 0.1, indep_test)\n        skel = cg.G.graph\n    except:\n        skel = np.ones((d, d))\n        \n    C_corr = np.corrcoef(X.T)\n    cand = []\n    for i in range(d):\n        c_i = set()\n        for j in range(d):\n            if skel[i, j] != 0: c_i.add(j)\n        top_j = np.argsort(-np.abs(C_corr[i]))\n        added = 0\n        for j in top_j:\n            if j == i: continue\n            if added >= 5: break\n            c_i.add(j)\n            added += 1\n        cand.append(tuple(sorted(list(c_i))))\n        \n    parent_cache = {}\n    \n    def calc_ebic(target, S):\n        sz = len(S)\n        if sz == 0:\n            var_res = C[target, target]\n            if var_res <= 0: var_res = 1e-8\n            return n * np.log(var_res)\n            \n        Css = C[np.ix_(S, S)] + np.eye(sz) * 1e-6\n        Csy = C[S, target]\n        try:\n            beta = np.linalg.solve(Css, Csy)\n        except:\n            beta = np.linalg.lstsq(Css, Csy, rcond=None)[0]\n        var_res = C[target, target] - np.dot(C[target, S], beta)\n        if var_res <= 0: var_res = 1e-8\n        \n        gamma = 0.5\n        bic_pen = sz * np.log(n)\n        if gamma > 0:\n            bic_pen += 2 * gamma * np.log(math.comb(d, sz))\n        return n * np.log(var_res) + bic_pen\n\n    def get_best_parents(target, avail_candidates):\n        if len(avail_candidates) == 0:\n            sc = calc_ebic(target, [])\n            parent_cache[(target, avail_candidates)] = ((), sc)\n            return (), sc\n        if (target, avail_candidates) in parent_cache:\n            return parent_cache[(target, avail_candidates)]\n            \n        if len(avail_candidates) <= 8:\n            best_S, best_score = (), float('inf')\n            for r in range(len(avail_candidates) + 1):\n                for combo in itertools.combinations(avail_candidates, r):\n                    sc = calc_ebic(target, list(combo))\n                    if sc < best_score:\n                        best_score, best_S = sc, combo\n            res = (tuple(sorted(best_S)), best_score)\n            parent_cache[(target, avail_candidates)] = res\n            return res\n        else:\n            current_S = []\n            current_score = calc_ebic(target, current_S)\n            changed = True\n            while changed:\n                changed = False\n                best_fwd_item, best_fwd_score = None, current_score\n                for c in avail_candidates:\n                    if c not in current_S:\n                        sc = calc_ebic(target, current_S + [c])\n                        if sc < best_fwd_score:\n                            best_fwd_score, best_fwd_item = sc, c\n                if best_fwd_item is not None:\n                    current_S.append(best_fwd_item)\n                    current_score = best_fwd_score\n                    changed = True\n                if len(current_S) > 1:\n                    best_bwd_item, best_bwd_score = None, current_score\n                    for c in current_S:\n                        test_S = [x for x in current_S if x != c]\n                        sc = calc_ebic(target, test_S)\n                        if sc < best_bwd_score:\n                            best_bwd_score, best_bwd_item = sc, c\n                    if best_bwd_item is not None:\n                        current_S.remove(best_bwd_item)\n                        current_score = best_bwd_score\n                        changed = True\n            res = (tuple(sorted(current_S)), current_score)\n            parent_cache[(target, avail_candidates)] = res\n            return res\n\n    def score_order(order):\n        total_score = 0\n        dag = {}\n        for pos, i in enumerate(order):\n            pred = order[:pos]\n            avail = tuple(sorted([x for x in pred if x in cand[i]]))\n            best_S, sc = get_best_parents(i, avail)\n            total_score += sc\n            dag[i] = best_S\n        return total_score, dag\n\n    def hc(order):\n        best_order = list(order)\n        best_score, best_dag = score_order(best_order)\n        while True:\n            improved = False\n            for i in range(d):\n                node = best_order[i]\n                for j in range(d):\n                    if i == j: continue\n                    new_order = best_order[:i] + best_order[i+1:]\n                    new_order.insert(j, node)\n                    sc, dag = score_order(new_order)\n                    if sc < best_score:\n                        best_score, best_order, best_dag = sc, new_order, dag\n                        improved = True\n            if not improved:\n                break\n        return best_order, best_score, best_dag\n\n    variances = np.diag(C)\n    restarts = [list(np.argsort(variances)), list(np.argsort(-variances))]\n    for _ in range(8):\n        rand_o = list(range(d))\n        np.random.shuffle(rand_o)\n        restarts.append(rand_o)\n        \n    global_best_score = float('inf')\n    global_best_dag = None\n    for init_order in restarts:\n        _, sc, dag = hc(init_order)\n        if sc < global_best_score:\n            global_best_score, global_best_dag = sc, dag\n\n    dag_graph = GeneralGraph(nodes)\n    for i in range(d):\n        for p in global_best_dag[i]:\n            edge = Edge(nodes[p], nodes[i], Endpoint.TAIL, Endpoint.ARROW)\n            dag_graph.add_edge(edge)\n\n    try:\n        cpdag = dag2cpdag(dag_graph)\n    except:\n        cpdag = dag_graph\n    return cpdag",
5  "end_line": 14,
6  "op": "replace"
7}
Tool ResultStep 1
ERROR: Package 'bench' is not in allowed packages
AssistantStep 2edit
Tool Input
1{
2  "op": "replace",
3  "content": "# =====================================================================\n# EDITABLE: implement run_causal_discovery below\n# =====================================================================\nimport itertools\nimport math\nfrom causallearn.utils.DAG2CPDAG import dag2cpdag\nfrom causallearn.graph.Edge import Edge\nfrom causallearn.graph.Endpoint import Endpoint\nfrom causallearn.utils.PCUtils import SkeletonDiscovery\nfrom causallearn.utils.cit import CIT\n\ndef run_causal_discovery(X: np.ndarray) -> GeneralGraph:\n    n, d = X.shape\n    nodes = [GraphNode(f\"X{i + 1}\") for i in range(d)]\n    C = np.cov(X.T, bias=True)\n    \n    indep_test = CIT(X, \"fisherz\")\n    try:\n        cg = SkeletonDiscovery.skeleton_discovery(X, 0.1, indep_test)\n        skel = cg.G.graph\n    except:\n        skel = np.ones((d, d))\n        \n    C_corr = np.corrcoef(X.T)\n    cand = []\n    for i in range(d):\n        c_i = set()\n        for j in range(d):\n            if skel[i, j] != 0: c_i.add(j)\n        top_j = np.argsort(-np.abs(C_corr[i]))\n        added = 0\n        for j in top_j:\n            if j == i: continue\n            if added >= 5: break\n            c_i.add(j)\n            added += 1\n        cand.append(tuple(sorted(list(c_i))))\n        \n    parent_cache = {}\n    \n    def calc_ebic(target, S):\n        sz = len(S)\n        if sz == 0:\n            var_res = C[target, target]\n            if var_res <= 0: var_res = 1e-8\n            return n * np.log(var_res)\n            \n        Css = C[np.ix_(S, S)] + np.eye(sz) * 1e-6\n        Csy = C[S, target]\n        try:\n            beta = np.linalg.solve(Css, Csy)\n        except:\n            beta = np.linalg.lstsq(Css, Csy, rcond=None)[0]\n        var_res = C[target, target] - np.dot(C[target, S], beta)\n        if var_res <= 0: var_res = 1e-8\n        \n        gamma = 0.5\n        bic_pen = sz * np.log(n)\n        if gamma > 0:\n            bic_pen += 2 * gamma * np.log(math.comb(d, sz))\n        return n * np.log(var_res) + bic_pen\n\n    def get_best_parents(target, avail_candidates):\n        if len(avail_candidates) == 0:\n            sc = calc_ebic(target, [])\n            parent_cache[(target, avail_candidates)] = ((), sc)\n            return (), sc\n        if (target, avail_candidates) in parent_cache:\n            return parent_cache[(target, avail_candidates)]\n            \n        if len(avail_candidates) <= 8:\n            best_S, best_score = (), float('inf')\n            for r in range(len(avail_candidates) + 1):\n                for combo in itertools.combinations(avail_candidates, r):\n                    sc = calc_ebic(target, list(combo))\n                    if sc < best_score:\n                        best_score, best_S = sc, combo\n            res = (tuple(sorted(best_S)), best_score)\n            parent_cache[(target, avail_candidates)] = res\n            return res\n        else:\n            current_S = []\n            current_score = calc_ebic(target, current_S)\n            changed = True\n            while changed:\n                changed = False\n                best_fwd_item, best_fwd_score = None, current_score\n                for c in avail_candidates:\n                    if c not in current_S:\n                        sc = calc_ebic(target, current_S + [c])\n                        if sc < best_fwd_score:\n                            best_fwd_score, best_fwd_item = sc, c\n                if best_fwd_item is not None:\n                    current_S.append(best_fwd_item)\n                    current_score = best_fwd_score\n                    changed = True\n                if len(current_S) > 1:\n                    best_bwd_item, best_bwd_score = None, current_score\n                    for c in current_S:\n                        test_S = [x for x in current_S if x != c]\n                        sc = calc_ebic(target, test_S)\n                        if sc < best_bwd_score:\n                            best_bwd_score, best_bwd_item = sc, c\n                    if best_bwd_item is not None:\n                        current_S.remove(best_bwd_item)\n                        current_score = best_bwd_score\n                        changed = True\n            res = (tuple(sorted(current_S)), current_score)\n            parent_cache[(target, avail_candidates)] = res\n            return res\n\n    def score_order(order):\n        total_score = 0\n        dag = {}\n        for pos, i in enumerate(order):\n            pred = order[:pos]\n            avail = tuple(sorted([x for x in pred if x in cand[i]]))\n            best_S, sc = get_best_parents(i, avail)\n            total_score += sc\n            dag[i] = best_S\n        return total_score, dag\n\n    def hc(order):\n        best_order = list(order)\n        best_score, best_dag = score_order(best_order)\n        while True:\n            improved = False\n            for i in range(d):\n                node = best_order[i]\n                for j in range(d):\n                    if i == j: continue\n                    new_order = best_order[:i] + best_order[i+1:]\n                    new_order.insert(j, node)\n                    sc, dag = score_order(new_order)\n                    if sc < best_score:\n                        best_score, best_order, best_dag = sc, new_order, dag\n                        improved = True\n            if not improved:\n                break\n        return best_order, best_score, best_dag\n\n    variances = np.diag(C)\n    restarts = [list(np.argsort(variances)), list(np.argsort(-variances))]\n    for _ in range(8):\n        rand_o = list(range(d))\n        np.random.shuffle(rand_o)\n        restarts.append(rand_o)\n        \n    global_best_score = float('inf')\n    global_best_dag = None\n    for init_order in restarts:\n        _, sc, dag = hc(init_order)\n        if sc < global_best_score:\n            global_best_score, global_best_dag = sc, dag\n\n    dag_graph = GeneralGraph(nodes)\n    for i in range(d):\n        for p in global_best_dag[i]:\n            edge = Edge(nodes[p], nodes[i], Endpoint.TAIL, Endpoint.ARROW)\n            dag_graph.add_edge(edge)\n\n    try:\n        cpdag = dag2cpdag(dag_graph)\n    except:\n        cpdag = dag_graph\n    return cpdag",
4  "filename": "causal-learn/bench/custom_algorithm.py",
5  "end_line": 14,
6  "start_line": 5
7}
Tool ResultStep 2
OK: Replaced lines 5..14 with 166 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–170.

[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–170 | total: 171 lines]
     1| import numpy as np
     2| from causallearn.graph.GeneralGraph import GeneralGraph
     3  from causallearn.graph.GraphNode import GraphNode
     4  
     5  # =====================================================================
     6  # EDITABLE: implement run_causal_discovery below
     7  # =====================================================================
     8  import itertools
     9  import math
    10  from causallearn.utils.DAG2CPDAG import dag2cpdag
    11  from causallearn.graph.Edge import Edge
    12  from causallearn.graph.Endpoint import Endpoint
    13  from causallearn.utils.PCUtils import SkeletonDiscovery
    14  from causallearn.utils.cit import CIT
    15  
    16  def run_causal_discovery(X: np.ndarray) -> GeneralGraph:
    17      n, d = X.shape
    18
AssistantStep 3test
AssistantStep 1edit
Tool Input
1{
2  "end_line": 15,
3  "content": "import numpy as np\nimport random\nfrom causallearn.graph.GeneralGraph import GeneralGraph\nfrom causallearn.graph.GraphNode import GraphNode\nfrom causallearn.utils.DAG2CPDAG import dag2cpdag\n\n# =====================================================================\n# EDITABLE: implement run_causal_discovery below\n# =====================================================================\ndef run_causal_discovery(X: np.ndarray) -> GeneralGraph:\n    \"\"\"\n    Input:  X of shape (n_samples, n_variables)\n    Output: estimated CPDAG as causallearn.graph.GeneralGraph.GeneralGraph\n    \"\"\"\n    n, d = X.shape\n    nodes = [GraphNode(f\"X{i + 1}\") for i in range(d)]\n    \n    # 1. Compute empirical covariance and precision-based penalty matrix\n    X_centered = X - np.mean(X, axis=0)\n    cov = X_centered.T @ X_centered / n\n    inv_cov = np.linalg.inv(cov + np.eye(d) * 1e-4) # slight ridge\n    \n    R = np.zeros((d, d))\n    for i in range(d):\n        for j in range(d):\n            R[i, j] = np.abs(inv_cov[i, j]) / max(np.sqrt(inv_cov[i, i] * inv_cov[j, j]), 1e-10)\n            \n    # Soft constraint penalty matrix\n    # Relaxes BIC for strong partial correlations, penalizes hard for weak ones\n    penalty_mat = np.log(n) * 2.0 * np.exp(-10.0 * R)\n    \n    # 2. Local score function\n    def calc_score(i, P):\n        if len(P) == 0:\n            v = cov[i, i]\n            if v < 1e-10: v = 1e-10\n            return n * np.log(v)\n            \n        cov_P = cov[np.ix_(P, P)]\n        cov_iP = cov[i, P]\n        inv_cov_P = np.linalg.pinv(cov_P)\n        v = cov[i, i] - cov_iP @ inv_cov_P @ cov_iP\n        if v < 1e-10: v = 1e-10\n        \n        # Modified BIC: 0.5 * log(n) base penalty + precision penalty\n        base_pen = len(P) * np.log(n) * 0.5\n        prec_pen = np.sum(penalty_mat[i, P])\n        \n        return n * np.log(v) + base_pen + prec_pen\n        \n    # 3. Cache and Greedy Parent Search\n    cache = {}\n    def get_best_parents(i, predecessors):\n        fs = frozenset(predecessors)\n        if (i, fs) in cache:\n            return cache[(i, fs)]\n        \n        current_P = []\n        best_score = calc_score(i, current_P)\n        \n        while len(current_P) < min(15, len(predecessors)):\n            best_next_p = None\n            best_next_score = best_score\n            \n            for p in predecessors:\n                if p not in current_P:\n                    cand_P = current_P + [p]\n                    score = calc_score(i, cand_P)\n                    if score < best_next_score - 1e-5:\n                        best_next_score = score\n                        best_next_p = p\n                        \n            if best_next_p is not None:\n                current_P.append(best_next_p)\n                best_score = best_next_score\n            else:\n                break\n                \n        # Backward elimination\n        while len(current_P) > 0:\n            best_rem_p = None\n            best_rem_score = best_score\n            for p in current_P:\n                cand_P = [x for x in current_P if x != p]\n                score = calc_score(i, cand_P)\n                if score < best_rem_score - 1e-5:\n                    best_rem_score = score\n                    best_rem_p = p\n            if best_rem_p is not None:\n                current_P.remove(best_rem_p)\n                best_score = best_rem_score\n            else:\n                break\n                \n        res = (current_P, best_score)\n        cache[(i, fs)] = res\n        return res\n\n    def score_order(order):\n        total_score = 0\n        dag_parents = []\n        for idx, i in enumerate(order):\n            predecessors = order[:idx]\n            P, p_score = get_best_parents(i, predecessors)\n            total_score += p_score\n            dag_parents.append(P)\n        return total_score, dag_parents\n\n    best_global_score = float('inf')\n    best_global_order = None\n    best_global_parents = None\n\n    num_restarts = 4\n    num_steps = 400\n    \n    for restart in range(num_restarts):\n        if restart == 0:\n            current_order = np.argsort(np.diag(cov)).tolist()\n        else:\n            current_order = list(range(d))\n            random.shuffle(current_order)\n            \n        current_score, current_parents = score_order(current_order)\n        \n        for step in range(num_steps):\n            new_order = current_order.copy()\n            if random.random() < 0.5:\n                # swap\n                idx1, idx2 = random.sample(range(d), 2)\n                new_order[idx1], new_order[idx2] = new_order[idx2], new_order[idx1]\n            else:\n                # shift\n                idx1, idx2 = random.sample(range(d), 2)\n                val = new_order.pop(idx1)\n                new_order.insert(idx2, val)\n            \n            new_score, new_parents = score_order(new_order)\n            \n            # Simulated Annealing acceptance\n            temp = 10.0 * (1.0 - step / num_steps)\n            if temp < 1e-3: temp = 1e-3\n            \n            diff = current_score - new_score\n            if diff > 0 or random.random() < np.exp(diff / temp):\n                current_score = new_score\n                current_order = new_order\n                current_parents = new_parents\n                \n        if current_score < best_global_score:\n            best_global_score = current_score\n            best_global_order = current_order\n            best_global_parents = current_parents\n\n    # 5. Build Final DAG\n    graph = GeneralGraph(nodes)\n    final_adjacency = np.zeros((d, d))\n    for idx, i in enumerate(best_global_order):\n        for p in best_global_parents[idx]:\n            final_adjacency[p, i] = 1\n            \n    for i in range(d):\n        for j in range(d):\n            if final_adjacency[i, j] == 1:\n                graph.add_directed_edge(nodes[i], nodes[j])\n                \n    # 6. Convert to CPDAG\n    cpdag = dag2cpdag(graph)\n    return cpdag\n# =====================================================================",
4  "filename": "bench/custom_algorithm.py",
5  "op": "replace",
6  "start_line": 1
7}
Tool ResultStep 1
ERROR: Package 'bench' is not in allowed packages
AssistantStep 2edit
Tool Input
1{
2  "end_line": 15,
3  "start_line": 1,
4  "op": "replace",
5  "content": "import numpy as np\nimport random\nfrom causallearn.graph.GeneralGraph import GeneralGraph\nfrom causallearn.graph.GraphNode import GraphNode\nfrom causallearn.utils.DAG2CPDAG import dag2cpdag\n\n# =====================================================================\n# EDITABLE: implement run_causal_discovery below\n# =====================================================================\ndef run_causal_discovery(X: np.ndarray) -> GeneralGraph:\n    \"\"\"\n    Input:  X of shape (n_samples, n_variables)\n    Output: estimated CPDAG as causallearn.graph.GeneralGraph.GeneralGraph\n    \"\"\"\n    n, d = X.shape\n    nodes = [GraphNode(f\"X{i + 1}\") for i in range(d)]\n    \n    # 1. Compute empirical covariance and precision-based penalty matrix\n    X_centered = X - np.mean(X, axis=0)\n    cov = X_centered.T @ X_centered / n\n    inv_cov = np.linalg.inv(cov + np.eye(d) * 1e-4) # slight ridge\n    \n    R = np.zeros((d, d))\n    for i in range(d):\n        for j in range(d):\n            R[i, j] = np.abs(inv_cov[i, j]) / max(np.sqrt(inv_cov[i, i] * inv_cov[j, j]), 1e-10)\n            \n    # Soft constraint penalty matrix\n    # Relaxes BIC for strong partial correlations, penalizes hard for weak ones\n    penalty_mat = np.log(n) * 2.0 * np.exp(-10.0 * R)\n    \n    # 2. Local score function\n    def calc_score(i, P):\n        if len(P) == 0:\n            v = cov[i, i]\n            if v < 1e-10: v = 1e-10\n            return n * np.log(v)\n            \n        cov_P = cov[np.ix_(P, P)]\n        cov_iP = cov[i, P]\n        inv_cov_P = np.linalg.pinv(cov_P)\n        v = cov[i, i] - cov_iP @ inv_cov_P @ cov_iP\n        if v < 1e-10: v = 1e-10\n        \n        # Modified BIC: 0.5 * log(n) base penalty + precision penalty\n        base_pen = len(P) * np.log(n) * 0.5\n        prec_pen = np.sum(penalty_mat[i, P])\n        \n        return n * np.log(v) + base_pen + prec_pen\n        \n    # 3. Cache and Greedy Parent Search\n    cache = {}\n    def get_best_parents(i, predecessors):\n        fs = frozenset(predecessors)\n        if (i, fs) in cache:\n            return cache[(i, fs)]\n        \n        current_P = []\n        best_score = calc_score(i, current_P)\n        \n        while len(current_P) < min(15, len(predecessors)):\n            best_next_p = None\n            best_next_score = best_score\n            \n            for p in predecessors:\n                if p not in current_P:\n                    cand_P = current_P + [p]\n                    score = calc_score(i, cand_P)\n                    if score < best_next_score - 1e-5:\n                        best_next_score = score\n                        best_next_p = p\n                        \n            if best_next_p is not None:\n                current_P.append(best_next_p)\n                best_score = best_next_score\n            else:\n                break\n                \n        # Backward elimination\n        while len(current_P) > 0:\n            best_rem_p = None\n            best_rem_score = best_score\n            for p in current_P:\n                cand_P = [x for x in current_P if x != p]\n                score = calc_score(i, cand_P)\n                if score < best_rem_score - 1e-5:\n                    best_rem_score = score\n                    best_rem_p = p\n            if best_rem_p is not None:\n                current_P.remove(best_rem_p)\n                best_score = best_rem_score\n            else:\n                break\n                \n        res = (current_P, best_score)\n        cache[(i, fs)] = res\n        return res\n\n    def score_order(order):\n        total_score = 0\n        dag_parents = []\n        for idx, i in enumerate(order):\n            predecessors = order[:idx]\n            P, p_score = get_best_parents(i, predecessors)\n            total_score += p_score\n            dag_parents.append(P)\n        return total_score, dag_parents\n\n    best_global_score = float('inf')\n    best_global_order = None\n    best_global_parents = None\n\n    num_restarts = 4\n    num_steps = 400\n    \n    for restart in range(num_restarts):\n        if restart == 0:\n            current_order = np.argsort(np.diag(cov)).tolist()\n        else:\n            current_order = list(range(d))\n            random.shuffle(current_order)\n            \n        current_score, current_parents = score_order(current_order)\n        \n        for step in range(num_steps):\n            new_order = current_order.copy()\n            if random.random() < 0.5:\n                # swap\n                idx1, idx2 = random.sample(range(d), 2)\n                new_order[idx1], new_order[idx2] = new_order[idx2], new_order[idx1]\n            else:\n                # shift\n                idx1, idx2 = random.sample(range(d), 2)\n                val = new_order.pop(idx1)\n                new_order.insert(idx2, val)\n            \n            new_score, new_parents = score_order(new_order)\n            \n            # Simulated Annealing acceptance\n            temp = 10.0 * (1.0 - step / num_steps)\n            if temp < 1e-3: temp = 1e-3\n            \n            diff = current_score - new_score\n            if diff > 0 or random.random() < np.exp(diff / temp):\n                current_score = new_score\n                current_order = new_order\n                current_parents = new_parents\n                \n        if current_score < best_global_score:\n            best_global_score = current_score\n            best_global_order = current_order\n            best_global_parents = current_parents\n\n    # 5. Build Final DAG\n    graph = GeneralGraph(nodes)\n    final_adjacency = np.zeros((d, d))\n    for idx, i in enumerate(best_global_order):\n        for p in best_global_parents[idx]:\n            final_adjacency[p, i] = 1\n            \n    for i in range(d):\n        for j in range(d):\n            if final_adjacency[i, j] == 1:\n                graph.add_directed_edge(nodes[i], nodes[j])\n                \n    # 6. Convert to CPDAG\n    cpdag = dag2cpdag(graph)\n    return cpdag\n# =====================================================================",
6  "filename": "causal-learn/bench/custom_algorithm.py"
7}
Tool ResultStep 2
ERROR: Lines 1..15 exceed the editable range. You may only edit lines 3–14.

[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–14 | total: 15 lines]
     1| import numpy as np
     2| from causallearn.graph.GeneralGraph import GeneralGraph
     3  from causallearn.graph.GraphNode import GraphNode
     4  
     5  # =====================================================================
     6  # EDITABLE: implement run_causal_discovery below
     7  # =====================================================================
     8  def run_causal_discovery(X: np.ndarray) -> GeneralGraph:
     9      """
    10      Input:  X of shape (n_samples, n_variables)
    11      Output: estimated CPDAG as causallearn.graph.GeneralGraph.GeneralGraph
    12      """
    13      nodes = [GraphNode(f"X{i + 1}") for i in range(X.shape[1])]
    14      return GeneralGraph(nodes)
    15| # =====================================================================
AssistantStep 3edit
Tool Input
1{
2  "start_line": 3,
3  "op": "replace",
4  "content": "from causallearn.graph.GraphNode import GraphNode\nfrom causallearn.utils.DAG2CPDAG import dag2cpdag\nfrom causallearn.score.LocalScoreFunction import local_score_BIC_from_cov\nfrom causallearn.utils.GESUtils import Combinatorial, score_g\nimport random\n\n# =====================================================================\n# EDITABLE: implement run_causal_discovery below\n# =====================================================================\ndef run_causal_discovery(X: np.ndarray) -> GeneralGraph:\n    \"\"\"\n    Input:  X of shape (n_samples, n_variables)\n    Output: estimated CPDAG as causallearn.graph.GeneralGraph.GeneralGraph\n    \"\"\"\n    n, d = X.shape\n    nodes = [GraphNode(f\"X{i + 1}\") for i in range(d)]\n    \n    # 1. Compute empirical covariance and precision-based penalty matrix\n    X_centered = X - np.mean(X, axis=0)\n    cov = X_centered.T @ X_centered / n\n    inv_cov = np.linalg.pinv(cov)\n    \n    R = np.zeros((d, d))\n    for i in range(d):\n        for j in range(d):\n            R[i, j] = np.abs(inv_cov[i, j]) / (np.sqrt(inv_cov[i, i] * inv_cov[j, j]) + 1e-10)\n            \n    # Soft constraint penalty matrix\n    # Relaxes BIC for strong partial correlations, penalizes hard for weak ones\n    penalty_mat = np.log(n) * 2.0 * np.exp(-10.0 * R)\n    \n    # 2. Local score function\n    def calc_score(i, P):\n        if len(P) == 0:\n            v = cov[i, i]\n            if v < 1e-10: v = 1e-10\n            return n * np.log(v)\n            \n        cov_P = cov[np.ix_(P, P)]\n        cov_iP = cov[i, P]\n        inv_cov_P = np.linalg.pinv(cov_P)\n        v = cov[i, i] - cov_iP @ inv_cov_P @ cov_iP\n        if v < 1e-10: v = 1e-10\n        \n        # Modified BIC: 0.5 * log(n) base penalty + precision penalty\n        base_pen = len(P) * np.log(n) * 0.5\n        prec_pen = np.sum(penalty_mat[i, P])\n        \n        return n * np.log(v) + base_pen + prec_pen\n        \n    # 3. Cache and Greedy Parent Search\n    cache = {}\n    def get_best_parents(i, predecessors):\n        fs = frozenset(predecessors)\n        if (i, fs) in cache:\n            return cache[(i, fs)]\n        \n        current_P = []\n        best_score = calc_score(i, current_P)\n        \n        while len(current_P) < min(15, len(predecessors)):\n            best_next_p = None\n            best_next_score = best_score\n            \n            for p in predecessors:\n                if p not in current_P:\n                    cand_P = current_P + [p]\n                    score = calc_score(i, cand_P)\n                    if score < best_next_score - 1e-5:\n                        best_next_score = score\n                        best_next_p = p\n                        \n            if best_next_p is not None:\n                current_P.append(best_next_p)\n                best_score = best_next_score\n            else:\n                break\n                \n        # Backward elimination\n        while len(current_P) > 0:\n            best_rem_p = None\n            best_rem_score = best_score\n            for p in current_P:\n                cand_P = [x for x in current_P if x != p]\n                score = calc_score(i, cand_P)\n                if score < best_rem_score - 1e-5:\n                    best_rem_score = score\n                    best_rem_p = p\n            if best_rem_p is not None:\n                current_P.remove(best_rem_p)\n                best_score = best_rem_score\n            else:\n                break\n                \n        res = (current_P, best_score)\n        cache[(i, fs)] = res\n        return res\n\n    def score_order(order):\n        total_score = 0\n        dag_parents = []\n        for idx, i in enumerate(order):\n            predecessors = order[:idx]\n            P, p_score = get_best_parents(i, predecessors)\n            total_score += p_score\n            dag_parents.append(P)\n        return total_score, dag_parents\n\n    best_global_score = float('inf')\n    best_global_order = None\n    best_global_parents = None\n\n    num_restarts = 4\n    num_steps = 400\n    \n    for restart in range(num_restarts):\n        if restart == 0:\n            current_order = np.argsort(np.diag(cov)).tolist()\n        else:\n            current_order = list(range(d))\n            random.shuffle(current_order)\n            \n        current_score, current_parents = score_order(current_order)\n        \n        for step in range(num_steps):\n            new_order = current_order.copy()\n            if random.random() < 0.5:\n                # swap\n                idx1, idx2 = random.sample(range(d), 2)\n                new_order[idx1], new_order[idx2] = new_order[idx2], new_order[idx1]\n            else:\n                # shift\n                idx1, idx2 = random.sample(range(d), 2)\n                val = new_order.pop(idx1)\n                new_order.insert(idx2, val)\n            \n            new_score, new_parents = score_order(new_order)\n            \n            # Simulated Annealing acceptance\n            temp = 10.0 * (1.0 - step / num_steps)\n            if temp < 1e-3: temp = 1e-3\n            \n            diff = current_score - new_score\n            if diff > 0 or random.random() < np.exp(diff / temp):\n                current_score = new_score\n                current_order = new_order\n                current_parents = new_parents\n                \n        if current_score < best_global_score:\n            best_global_score = current_score\n            best_global_order = current_order\n            best_global_parents = current_parents\n\n    # 5. Build Final DAG\n    graph = GeneralGraph(nodes)\n    final_adjacency = np.zeros((d, d))\n    for idx, i in enumerate(best_global_order):\n        for p in best_global_parents[idx]:\n            final_adjacency[p, i] = 1\n            \n    for i in range(d):\n        for j in range(d):\n            if final_adjacency[i, j] == 1:\n                graph.add_directed_edge(nodes[i], nodes[j])\n                \n    # 6. Convert to CPDAG\n    cpdag = dag2cpdag(graph)\n    return cpdag",
5  "end_line": 14,
6  "filename": "causal-learn/bench/custom_algorithm.py"
7}
Tool ResultStep 3
OK: Replaced lines 3..14 with 168 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–170.

[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–170 | total: 171 lines]
     1| import numpy as np
     2| from causallearn.graph.GeneralGraph import GeneralGraph
     3  from causallearn.graph.GraphNode import GraphNode
     4  from causallearn.utils.DAG2CPDAG import dag2cpdag
     5  from causallearn.score.LocalScoreFunction import local_score_BIC_from_cov
     6  from causallearn.utils.GESUtils import Combinatorial, score_g
     7  import random
     8  
     9  # =====================================================================
    10  # EDITABLE: implement run_causal_discovery below
    11  # =====================================================================
    12  def run_causal_discovery(X: np.ndarray) -> GeneralGraph:
    13      """
    14      Input:  X of shape (n_samples, n_variables)
    15      Output: estimated CPDAG as causallearn.grap
AssistantStep 4test
Tool ResultStep 4
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)

### ER10 (scripts/eval_er10.sh)
Results (ER10):
  SHD=0  AdjP=1.0000 AdjR=1.0000  ArrowP=1.0000 ArrowR=1.0000

### ER20 (scripts/eval_er20.sh)
Results (ER20):
  SHD=5  AdjP=0.9302 AdjR=0.9756  ArrowP=0.8919 ArrowR=0.9706

### SF50 (scripts/eval_sf50.sh)
Results (SF50):
  SHD=17  AdjP=0.9500 AdjR=0.9896  ArrowP=0.9688 ArrowR=0.8378

### ER10-Hard (scripts/eval_er10_hard.sh)
Results (ER10-Hard):
  SHD=18  AdjP=0.9200 AdjR=0.8846  ArrowP=0.4167 ArrowR=0.3846

### ER20-Hard (scripts/eval_er20_hard.sh)
Results (ER20-Hard):
  SHD=6  AdjP=0.9571 AdjR=0.9710  ArrowP=0.9420 ArrowR=0.9559

### SF50-Hard (scripts/eval_sf50_hard.sh)
Results (SF50-Hard):
  SHD=48  AdjP=0.8509 AdjR=0.9716  ArrowP=0.7931 ArrowR=0.8333

### ER10-Noisy (scripts/eval_er10_noisy.sh)
Results (ER10-Noisy):
  SHD=17  AdjP=0.9167 AdjR=0.8462  ArrowP=0.4783 ArrowR=0.4231

### ER20-Noisy (scripts/eval_er20_noisy.sh)
Results (ER20-Noisy):
  SHD=9  AdjP=0.929
AssistantStep 5edit
Tool Input
1{
2  "op": "replace",
3  "end_line": 170,
4  "content": "import numpy as np\nimport random\nfrom causallearn.graph.GeneralGraph import GeneralGraph\nfrom causallearn.graph.GraphNode import GraphNode\nfrom causallearn.utils.DAG2CPDAG import dag2cpdag\n\n# =====================================================================\n# EDITABLE: implement run_causal_discovery below\n# =====================================================================\ndef run_causal_discovery(X: np.ndarray) -> GeneralGraph:\n    \"\"\"\n    Input:  X of shape (n_samples, n_variables)\n    Output: estimated CPDAG as causallearn.graph.GeneralGraph.GeneralGraph\n    \"\"\"\n    n, d = X.shape\n    nodes = [GraphNode(f\"X{i + 1}\") for i in range(d)]\n    \n    # 1. Compute empirical covariance and precision-based penalty matrix\n    X_centered = X - np.mean(X, axis=0)\n    cov = X_centered.T @ X_centered / n\n    inv_cov = np.linalg.pinv(cov)\n    \n    R = np.zeros((d, d))\n    for i in range(d):\n        for j in range(d):\n            R[i, j] = np.abs(inv_cov[i, j]) / (np.sqrt(inv_cov[i, i] * inv_cov[j, j]) + 1e-10)\n            \n    # Soft constraint penalty matrix\n    # Relaxes BIC for strong partial correlations, penalizes hard for weak ones\n    penalty_mat = np.log(n) * 2.0 * np.exp(-10.0 * R)\n    ev_scale = max(1e-5, np.median(1.0 / np.clip(np.diag(inv_cov), 1e-10, None)))\n    \n    # Precompute subset score to avoid redundant matrix inversions\n    _score_cache = {}\n    def calc_score(i, P_tuple):\n        if (i, P_tuple) in _score_cache:\n            return _score_cache[(i, P_tuple)]\n            \n        if len(P_tuple) == 0:\n            v = cov[i, i]\n            if v < 1e-10: v = 1e-10\n            base_pen = 0.0\n            s = n * np.log(v) + n * 0.4 * (v / ev_scale) + base_pen\n            _score_cache[(i, P_tuple)] = s\n            return s\n            \n        P_list = list(P_tuple)\n        cov_P = cov[np.ix_(P_list, P_list)]\n        cov_iP = cov[i, P_list]\n        try:\n            inv_cov_P = np.linalg.pinv(cov_P)\n            v = cov[i, i] - cov_iP @ inv_cov_P @ cov_iP\n        except:\n            v = cov[i, i]\n            \n        if v < 1e-10: v = 1e-10\n        \n        # Modified BIC: 0.5 * log(n) base penalty + precision penalty + EV assumption penalty\n        # EV penalty uses n * 0.4 * (v / ev_scale) to break score equivalence and enforce exact fully directed DAG\n        base_pen = len(P_tuple) * np.log(n) * 0.5\n        prec_pen = sum(penalty_mat[i, p] for p in P_tuple)\n        \n        s = n * np.log(v) + n * 0.4 * (v / ev_scale) + base_pen + prec_pen\n        _score_cache[(i, P_tuple)] = s\n        return s\n        \n    # 3. Cache and Greedy Parent Search\n    cache = {}\n    def get_best_parents(i, predecessors):\n        fs = frozenset(predecessors)\n        if (i, fs) in cache:\n            return cache[(i, fs)]\n        \n        current_P = []\n        best_score = calc_score(i, tuple(sorted(current_P)))\n        \n        # Forward greedy\n        while len(current_P) < min(15, len(predecessors)):\n            best_next_p = None\n            best_next_score = best_score\n            \n            for p in predecessors:\n                if p not in current_P:\n                    cand_P = current_P + [p]\n                    score = calc_score(i, tuple(sorted(cand_P)))\n                    if score < best_next_score - 1e-5:\n                        best_next_score = score\n                        best_next_p = p\n                        \n            if best_next_p is not None:\n                current_P.append(best_next_p)\n                best_score = best_next_score\n            else:\n                break\n                \n        # Backward elimination\n        while len(current_P) > 0:\n            best_rem_p = None\n            best_rem_score = best_score\n            for p in current_P:\n                cand_P = [x for x in current_P if x != p]\n                score = calc_score(i, tuple(sorted(cand_P)))\n                if score < best_rem_score - 1e-5:\n                    best_rem_score = score\n                    best_rem_p = p\n            if best_rem_p is not None:\n                current_P.remove(best_rem_p)\n                best_score = best_rem_score\n            else:\n                break\n                \n        res = (current_P, best_score)\n        cache[(i, fs)] = res\n        return res\n\n    def get_node_score(i, preds_set):\n        return get_best_parents(i, preds_set)[1]\n\n    # Initialize order using marginal variances (exact for homoscedastic DAG)\n    marginal_vars = np.diag(cov)\n    current_order = np.argsort(marginal_vars).tolist()\n    \n    # Store dynamic predecessors\n    preds_of = [set() for _ in range(d)]\n    for idx, i in enumerate(current_order):\n        preds_of[i] = set(current_order[:idx])\n        \n    current_scores = [get_node_score(i, preds_of[i]) for i in range(d)]\n    current_total_score = sum(current_scores)\n    \n    # Hill Climbing over Orderings\n    best_global_score = current_total_score\n    best_global_order = current_order.copy()\n    \n    no_improvement = 0\n    max_steps = 1500\n    \n    for step in range(max_steps):\n        improved = False\n        # Try adjacent swaps\n        best_swap_diff = 1e-5\n        best_swap_idx = None\n        \n        for k in range(d - 1):\n            i = current_order[k]\n            j = current_order[k+1]\n            \n            # If we swap i and j, i will have j in its predecessors (was not).\n            # j will NOT have i in its predecessors (was).\n            new_preds_i = preds_of[i] | {j}\n            new_preds_j = preds_of[j] - {i}\n            \n            score_i_new = get_node_score(i, new_preds_i)\n            score_j_new = get_node_score(j, new_preds_j)\n            \n            diff = (current_scores[i] + current_scores[j]) - (score_i_new + score_j_new)\n            if diff > best_swap_diff:\n                best_swap_diff = diff\n                best_swap_idx = k\n                \n        if best_swap_idx is not None:\n            k = best_swap_idx\n            i = current_order[k]\n            j = current_order[k+1]\n            \n            current_order[k], current_order[k+1] = j, i\n            preds_of[i].add(j)\n            preds_of[j].remove(i)\n            \n            current_scores[i] = get_node_score(i, preds_of[i])\n            current_scores[j] = get_node_score(j, preds_of[j])\n            current_total_score -= best_swap_diff\n            improved = True\n        else:\n            # Shift moves\n            best_shift_diff = 1e-5\n            best_shift_move = None # (from_idx, to_idx)\n            \n            # Sub-sample shift combinations to save time\n            shift_cands = random.sample(range(d), min(15, d))\n            for f in shift_cands:\n                for t in range(max(0, f-4), min(d, f+5)):\n                    if f == t: continue\n                    \n                    # Evaluate shift\n                    temp_order = current_order.copy()\n                    val = temp_order.pop(f)\n                    temp_order.insert(t, val)\n                    \n                    diff = 0\n                    min_idx = min(f, t)\n                    max_idx = max(f, t)\n                    \n                    temp_preds_set = set(temp_order[:min_idx])\n                    valid = True\n                    for k in range(min_idx, max_idx + 1):\n                        u = temp_order[k]\n                        sc = get_node_score(u, temp_preds_set)\n                        diff += current_scores[u] - sc\n                        temp_preds_set.add(u)\n                        \n                    if diff > best_shift_diff:\n                        best_shift_diff = diff\n                        best_shift_move = (f, t)\n                        \n            if best_shift_move is not None:\n                f, t = best_shift_move\n                val = current_order.pop(f)\n                current_order.insert(t, val)\n                \n                # Update preds and scores\n                running_preds = set()\n                for k in range(d):\n                    u = current_order[k]\n                    preds_of[u] = running_preds.copy()\n                    current_scores[u] = get_node_score(u, preds_of[u])\n                    running_preds.add(u)\n                    \n                current_total_score = sum(current_scores)\n                improved = True\n\n        if current_total_score < best_global_score:\n            best_global_score = current_total_score\n            best_global_order = current_order.copy()\n            no_improvement = 0\n        else:\n            if not improved:\n                # Add some randomness (restart from a swapped order)\n                idx1, idx2 = random.sample(range(d), 2)\n                val = current_order.pop(idx1)\n                current_order.insert(idx2, val)\n                \n                running_preds = set()\n                for k in range(d):\n                    u = current_order[k]\n                    preds_of[u] = running_preds.copy()\n                    current_scores[u] = get_node_score(u, preds_of[u])\n                    running_preds.add(u)\n                current_total_score = sum(current_scores)\n                no_improvement += 1\n            else:\n                no_improvement += 1\n                \n        if no_improvement > 15:\n            break\n\n    # 5. Build Final DAG from best order\n    graph = GeneralGraph(nodes)\n    final_adjacency = np.zeros((d, d))\n    \n    current_preds = []\n    for i in best_global_order:\n        P, _ = get_best_parents(i, current_preds)\n        for p in P:\n            final_adjacency[p, i] = 1\n        current_preds.append(i)\n            \n    for i in range(d):\n        for j in range(d):\n            if final_adjacency[i, j] == 1:\n                graph.add_directed_edge(nodes[i], nodes[j])\n                \n    # 6. Convert to CPDAG\n    cpdag = dag2cpdag(graph)\n    return cpdag\n# =====================================================================",
5  "start_line": 12,
6  "filename": "causal-learn/bench/custom_algorithm.py"
7}
Tool ResultStep 5
OK: Replaced lines 12..170 with 266 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–277.

[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–277 | total: 278 lines]
     1| import numpy as np
     2| from causallearn.graph.GeneralGraph import GeneralGraph
     3  from causallearn.graph.GraphNode import GraphNode
     4  from causallearn.utils.DAG2CPDAG import dag2cpdag
     5  from causallearn.score.LocalScoreFunction import local_score_BIC_from_cov
     6  from causallearn.utils.GESUtils import Combinatorial, score_g
     7  import random
     8  
     9  # =====================================================================
    10  # EDITABLE: implement run_causal_discovery below
    11  # =====================================================================
    12  import numpy as np
    13  import random
    14  from causallearn.graph.GeneralGraph import GeneralGraph
    15  from causallearn.graph.GraphNode import GraphNode
    16  from causal
AssistantStep 6test
Tool ResultStep 6
[Test #2] (1 test remaining; call submit(n=2) to submit this result)

### ER10 (scripts/eval_er10.sh)
Results (ER10):
  SHD=0  AdjP=1.0000 AdjR=1.0000  ArrowP=1.0000 ArrowR=1.0000

### ER20 (scripts/eval_er20.sh)
Results (ER20):
  SHD=8  AdjP=0.8542 AdjR=1.0000  ArrowP=0.8293 ArrowR=1.0000

### SF50 (scripts/eval_sf50.sh)
Results (SF50):
  SHD=86  AdjP=0.5963 AdjR=1.0000  ArrowP=0.4275 ArrowR=0.7973

### ER10-Hard (scripts/eval_er10_hard.sh)
Results (ER10-Hard):
  SHD=4  AdjP=0.8667 AdjR=1.0000  ArrowP=0.8667 ArrowR=1.0000

### ER20-Hard (scripts/eval_er20_hard.sh)
Results (ER20-Hard):
  SHD=22  AdjP=0.7882 AdjR=0.9710  ArrowP=0.7647 ArrowR=0.9559

### SF50-Hard (scripts/eval_sf50_hard.sh)
Results (SF50-Hard):
  SHD=171  AdjP=0.4927 AdjR=0.9574  ArrowP=0.3911 ArrowR=0.7681

### ER10-Noisy (scripts/eval_er10_noisy.sh)
Results (ER10-Noisy):
  SHD=4  AdjP=0.8667 AdjR=1.0000  ArrowP=0.8667 ArrowR=1.0000

### ER20-Noisy (scripts/eval_er20_noisy.sh)
Results (ER20-Noisy):
  SHD=18  AdjP=0.827
AssistantStep 7edit
Tool Input
1{
2  "op": "replace",
3  "filename": "causal-learn/bench/custom_algorithm.py",
4  "content": "import numpy as np\nimport random\nfrom causallearn.graph.GeneralGraph import GeneralGraph\nfrom causallearn.graph.GraphNode import GraphNode\nfrom causallearn.utils.DAG2CPDAG import dag2cpdag\n\n# =====================================================================\n# EDITABLE: implement run_causal_discovery below\n# =====================================================================\ndef run_causal_discovery(X: np.ndarray) -> GeneralGraph:\n    \"\"\"\n    Input:  X of shape (n_samples, n_variables)\n    Output: estimated CPDAG as causallearn.graph.GeneralGraph.GeneralGraph\n    \"\"\"\n    n, d = X.shape\n    nodes = [GraphNode(f\"X{i + 1}\") for i in range(d)]\n    \n    # 1. Compute empirical covariance and precision-based penalty matrix\n    X_centered = X - np.mean(X, axis=0)\n    cov = X_centered.T @ X_centered / n\n    inv_cov = np.linalg.pinv(cov)\n    \n    R = np.zeros((d, d))\n    for i in range(d):\n        for j in range(d):\n            R[i, j] = np.abs(inv_cov[i, j]) / (np.sqrt(inv_cov[i, i] * inv_cov[j, j]) + 1e-10)\n            \n    # Soft constraint penalty matrix\n    penalty_mat = np.log(n) * 1.5 * np.exp(-12.0 * R)\n    ev_scale = max(1e-5, np.median(1.0 / np.clip(np.diag(inv_cov), 1e-10, None)))\n    \n    _score_cache = {}\n    def calc_score_for_parents(i, P_tuple):\n        if (i, P_tuple) in _score_cache:\n            return _score_cache[(i, P_tuple)]\n            \n        if len(P_tuple) == 0:\n            v = cov[i, i]\n            if v < 1e-10: v = 1e-10\n            c_score = n * np.log(v)\n            _score_cache[(i, P_tuple)] = (c_score, v)\n            return c_score, v\n            \n        P_list = list(P_tuple)\n        cov_P = cov[np.ix_(P_list, P_list)]\n        cov_iP = cov[i, P_list]\n        try:\n            inv_cov_P = np.linalg.pinv(cov_P)\n            v = cov[i, i] - cov_iP @ inv_cov_P @ cov_iP\n        except:\n            v = cov[i, i]\n            \n        if v < 1e-10: v = 1e-10\n        \n        # Standard BIC + precision penalty for parent selection\n        base_pen = len(P_tuple) * np.log(n) * 0.55\n        prec_pen = sum(penalty_mat[i, p] for p in P_tuple)\n        \n        c_score = n * np.log(v) + base_pen + prec_pen\n        _score_cache[(i, P_tuple)] = (c_score, v)\n        return c_score, v\n        \n    cache = {}\n    def get_best_parents(i, predecessors):\n        fs = frozenset(predecessors)\n        if (i, fs) in cache:\n            return cache[(i, fs)]\n        \n        current_P = []\n        best_score, best_v = calc_score_for_parents(i, tuple(sorted(current_P)))\n        \n        # Forward greedy\n        while len(current_P) < min(15, len(predecessors)):\n            best_next_p = None\n            best_next_score = best_score\n            \n            for p in predecessors:\n                if p not in current_P:\n                    cand_P = current_P + [p]\n                    score, _ = calc_score_for_parents(i, tuple(sorted(cand_P)))\n                    if score < best_next_score - 1e-5:\n                        best_next_score = score\n                        best_next_p = p\n                        \n            if best_next_p is not None:\n                current_P.append(best_next_p)\n                best_score = best_next_score\n            else:\n                break\n                \n        # Backward elimination\n        while len(current_P) > 0:\n            best_rem_p = None\n            best_rem_score = best_score\n            for p in current_P:\n                cand_P = [x for x in current_P if x != p]\n                score, _ = calc_score_for_parents(i, tuple(sorted(cand_P)))\n                if score < best_rem_score - 1e-5:\n                    best_rem_score = score\n                    best_rem_p = p\n            if best_rem_p is not None:\n                current_P.remove(best_rem_p)\n                best_score = best_rem_score\n            else:\n                break\n                \n        final_P_tuple = tuple(sorted(current_P))\n        _, final_v = calc_score_for_parents(i, final_P_tuple)\n        \n        # Break score equivalence using the homoscedastic EV sum of variances\n        order_score = best_score + n * 0.35 * (final_v / ev_scale)\n        \n        res = (current_P, order_score)\n        cache[(i, fs)] = res\n        return res\n\n    def get_node_score(i, preds_set):\n        return get_best_parents(i, preds_set)[1]\n\n    marginal_vars = np.diag(cov)\n    current_order = np.argsort(marginal_vars).tolist()\n    \n    preds_of = [set() for _ in range(d)]\n    for idx, i in enumerate(current_order):\n        preds_of[i] = set(current_order[:idx])\n        \n    current_scores = [get_node_score(i, preds_of[i]) for i in range(d)]\n    current_total_score = sum(current_scores)\n    \n    best_global_score = current_total_score\n    best_global_order = current_order.copy()\n    \n    no_improvement = 0\n    max_steps = 1500\n    \n    for step in range(max_steps):\n        improved = False\n        best_swap_diff = 1e-5\n        best_swap_idx = None\n        \n        for k in range(d - 1):\n            i = current_order[k]\n            j = current_order[k+1]\n            \n            new_preds_i = preds_of[i] | {j}\n            new_preds_j = preds_of[i].copy()\n            \n            score_i_new = get_node_score(i, new_preds_i)\n            score_j_new = get_node_score(j, new_preds_j)\n            \n            diff = (current_scores[i] + current_scores[j]) - (score_i_new + score_j_new)\n            if diff > best_swap_diff:\n                best_swap_diff = diff\n                best_swap_idx = k\n                \n        if best_swap_idx is not None:\n            k = best_swap_idx\n            i = current_order[k]\n            j = current_order[k+1]\n            \n            current_order[k], current_order[k+1] = j, i\n            preds_of[i].add(j)\n            preds_of[j].remove(i)\n            \n            current_scores[i] = get_node_score(i, preds_of[i])\n            current_scores[j] = get_node_score(j, preds_of[j])\n            current_total_score -= best_swap_diff\n            improved = True\n        else:\n            best_shift_diff = 1e-5\n            best_shift_move = None\n            \n            shift_cands = random.sample(range(d), min(20, d))\n            for f in shift_cands:\n                for t in range(max(0, f-5), min(d, f+6)):\n                    if f == t: continue\n                    \n                    temp_order = current_order.copy()\n                    val = temp_order.pop(f)\n                    temp_order.insert(t, val)\n                    \n                    diff = 0\n                    min_idx = min(f, t)\n                    max_idx = max(f, t)\n                    \n                    temp_preds_set = set(temp_order[:min_idx])\n                    valid = True\n                    for k in range(min_idx, max_idx + 1):\n                        u = temp_order[k]\n                        sc = get_node_score(u, temp_preds_set)\n                        diff += current_scores[u] - sc\n                        temp_preds_set.add(u)\n                        \n                    if diff > best_shift_diff:\n                        best_shift_diff = diff\n                        best_shift_move = (f, t)\n                        \n            if best_shift_move is not None:\n                f, t = best_shift_move\n                val = current_order.pop(f)\n                current_order.insert(t, val)\n                \n                running_preds = set()\n                for k in range(d):\n                    u = current_order[k]\n                    preds_of[u] = running_preds.copy()\n                    current_scores[u] = get_node_score(u, preds_of[u])\n                    running_preds.add(u)\n                    \n                current_total_score = sum(current_scores)\n                improved = True\n\n        if current_total_score < best_global_score - 1e-5:\n            best_global_score = current_total_score\n            best_global_order = current_order.copy()\n            no_improvement = 0\n        else:\n            if not improved:\n                idx1, idx2 = random.sample(range(d), 2)\n                val = current_order.pop(idx1)\n                current_order.insert(idx2, val)\n                \n                running_preds = set()\n                for k in range(d):\n                    u = current_order[k]\n                    preds_of[u] = running_preds.copy()\n                    current_scores[u] = get_node_score(u, preds_of[u])\n                    running_preds.add(u)\n                current_total_score = sum(current_scores)\n                no_improvement += 1\n            else:\n                no_improvement += 1\n                \n        if no_improvement > 15:\n            break\n\n    graph = GeneralGraph(nodes)\n    final_adjacency = np.zeros((d, d))\n    \n    current_preds = []\n    for i in best_global_order:\n        P, _ = get_best_parents(i, current_preds)\n        for p in P:\n            final_adjacency[p, i] = 1\n        current_preds.append(i)\n            \n    for i in range(d):\n        for j in range(d):\n            if final_adjacency[i, j] == 1:\n                graph.add_directed_edge(nodes[i], nodes[j])\n                \n    cpdag = dag2cpdag(graph)\n    return cpdag\n# =====================================================================",
5  "end_line": 278,
6  "start_line": 12
7}
Tool ResultStep 7
ERROR: Lines 12..278 exceed the editable range. You may only edit lines 3–277.

[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–277 | total: 278 lines]
     1| import numpy as np
     2| from causallearn.graph.GeneralGraph import GeneralGraph
     3  from causallearn.graph.GraphNode import GraphNode
     4  from causallearn.utils.DAG2CPDAG import dag2cpdag
     5  from causallearn.score.LocalScoreFunction import local_score_BIC_from_cov
     6  from causallearn.utils.GESUtils import Combinatorial, score_g
     7  import random
     8  
     9  # =====================================================================
    10  # EDITABLE: implement run_causal_discovery below
    11  # =====================================================================
    12  import numpy as np
    13  import random
    14  from causallearn.graph.GeneralGraph import GeneralGraph
    15  from causallearn.graph.GraphNode import GraphNode
    16  from causallearn.utils.DAG2CPDAG import da