Agent Conversation: google/gemini-3.1-pro-preview
Task: causal-observational-nonlinear
Causal Discovery: Observational Nonlinear Data
Objective
Implement a causal discovery algorithm that recovers the DAG structure from purely observational data generated by a nonlinear Additive Noise Model (ANM). Your code goes in bench/custom_algorithm.py.
Background
Additive Noise Models (ANMs) assume structural equations of the form x_j = f_j(parents(j)) + e_j, where f_j are nonlinear functions and e_j are independent noise terms. Under mild conditions on f_j and the noise distribution, the causal DAG is identifiable from observational data alone (Hoyer et al., 2008; Peters et al., 2014). This goes beyond linear methods (LiNGAM, PC, GES) that assume linearity.
Key challenges include: (1) testing independence in high-dimensional nonlinear settings, (2) scaling to larger graphs, (3) handling diverse nonlinear function types (MLP, GP, polynomial, sigmoid), and (4) robustness to different noise distributions including the harder Gaussian noise case.
Evaluation Scenarios
| Label | Graph type | Nodes | Samples | Noise | Nonlinearity |
|---|---|---|---|---|---|
| ER8-MLP | Erdos-Renyi | 8 | 500 | Exponential | MLP |
| ER12-GP | Erdos-Renyi | 12 | 1000 | Laplace | GP |
| SF10-Mixed | Scale-Free (BA) | 10 | 500 | Uniform | Mixed |
| ER15-Sigmoid | Erdos-Renyi | 15 | 1000 | Exponential | Sigmoid |
| ER20-Mixed | Erdos-Renyi | 20 | 2000 | Laplace | Mixed |
| SF20-GP | Scale-Free (BA) | 20 | 2000 | Exponential | GP |
| ER12-LowSample | Erdos-Renyi | 12 | 150 | Laplace | Mixed |
| ER20-Gauss | Erdos-Renyi | 20 | 2000 | Gaussian | Mixed |
Metrics
All computed on the directed edge set (skeleton + direction must be correct):
- F1 (primary ranking metric), SHD, Precision, Recall
Baselines
cam: Causal Additive Models (Buehlmann et al., 2014) -- score-based with GAM regressionnotears_mlp: NOTEARS with MLP parametrization (Zheng et al., 2020) -- continuous optimizationdirectlingam: DirectLiNGAM (Shimizu et al., 2011) -- linear reference baseline
causal-learn/bench/custom_algorithm.py [EDITABLE — lines 3–14 only]
1: import numpy as np
2:
3: # =====================================================================
4: # EDITABLE: implement run_causal_discovery below
5: # =====================================================================
6: def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7: """
8: Input: X of shape (n_samples, n_variables)
9: Output: adjacency matrix B of shape (n_variables, n_variables)
10: B[i, j] != 0 means j -> i (follows causal-learn convention)
11: """
12: n = X.shape[1]
13: return np.zeros((n, n))
14: # =====================================================================
cam baseline — editable region [READ-ONLY — reference implementation]
Lines 3–14:
1: import numpy as nTool Input
1{
2 "end_line": 14,
3 "op": "replace",
4 "filename": "bench/custom_algorithm.py",
5 "content": " import warnings\n from sklearn.model_selection import KFold\n from sklearn.ensemble import HistGradientBoostingRegressor, RandomForestRegressor\n from sklearn.preprocessing import StandardScaler\n\n warnings.filterwarnings(\"ignore\")\n\n n_samples, n_vars = X.shape\n\n # Standardize data to normalize density scales\n scaler = StandardScaler()\n X_scaled = scaler.fit_transform(X)\n\n # Entropy estimator parameter m\n m_spacing = max(2, int(np.sqrt(n_samples)) // 2)\n\n def calculate_entropy(residuals):\n n = len(residuals)\n if n < 2 * m_spacing + 5:\n return 0.5 * np.log(np.var(residuals) + 1e-8)\n res_sorted = np.sort(residuals)\n diff = res_sorted[2 * m_spacing:] - res_sorted[:-2 * m_spacing]\n diff = np.maximum(diff, 1e-9)\n return np.mean(np.log(diff))\n\n memo = {}\n\n def get_score(target, parents):\n parents_tuple = tuple(sorted(parents))\n if parents_tuple in memo.get(target, {}):\n return memo[target][parents_tuple]\n\n X_target = X_scaled[:, target]\n if len(parents) == 0:\n score = calculate_entropy(X_target - np.mean(X_target))\n else:\n X_features = X_scaled[:, parents]\n kf = KFold(n_splits=5, shuffle=True, random_state=42)\n residuals = np.zeros_like(X_target)\n\n if n_samples < 500:\n model = RandomForestRegressor(n_estimators=30, max_depth=5, random_state=42, n_jobs=1)\n else:\n model = HistGradientBoostingRegressor(max_iter=40, max_depth=5, min_samples_leaf=10, random_state=42)\n\n for train_index, test_index in kf.split(X_features):\n X_train, X_test = X_features[train_index], X_features[test_index]\n y_train, y_test = X_target[train_index], X_target[test_index]\n model.fit(X_train, y_train)\n residuals[test_index] = y_test - model.predict(X_test)\n\n # BIC-like complexity penalty for edges\n penalty = 0.05 * len(parents)\n score = calculate_entropy(residuals) + penalty\n\n if target not in memo:\n memo[target] = {}\n memo[target][parents_tuple] = score\n return score\n\n parents_list = [[] for _ in range(n_vars)]\n current_scores = [get_score(i, []) for i in range(n_vars)]\n\n def check_cycle(p_list):\n visited = [0] * n_vars\n rec_stack = [0] * n_vars\n\n def dfs(node):\n visited[node] = 1\n rec_stack[node] = 1\n for p in p_list[node]:\n if visited[p] == 0:\n if dfs(p): return True\n elif rec_stack[p] == 1:\n return True\n rec_stack[node] = 0\n return False\n\n for i in range(n_vars):\n if visited[i] == 0:\n if dfs(i): return True\n return False\n\n # Forward greedy search\n while True:\n best_reduction = 0\n best_edge = None\n best_edge_score = 0\n\n for p in range(n_vars):\n for j in range(n_vars):\n if p == j or p in parents_list[j]:\n continue\n\n parents_list[j].append(p)\n has_cycle = check_cycle(parents_list)\n parents_list[j].remove(p)\n\n if has_cycle:\n continue\n\n new_parents = parents_list[j] + [p]\n new_score = get_score(j, new_parents)\n reduction = current_scores[j] - new_score\n\n if reduction > best_reduction:\n best_reduction = reduction\n best_edge = (p, j)\n best_edge_score = new_score\n\n if best_reduction > 1e-4:\n p, j = best_edge\n parents_list[j].append(p)\n current_scores[j] = best_edge_score\n else:\n break\n\n # Backward pruning\n while True:\n best_reduction = 0\n best_edge = None\n best_edge_score = 0\n\n for j in range(n_vars):\n for p in parents_list[j]:\n new_parents = parents_list[j].copy()\n new_parents.remove(p)\n\n new_score = get_score(j, new_parents)\n reduction = current_scores[j] - new_score\n\n if reduction > best_reduction:\n best_reduction = reduction\n best_edge = (p, j)\n best_edge_score = new_score\n\n if best_reduction > 1e-4:\n p, j = best_edge\n parents_list[j].remove(p)\n current_scores[j] = best_edge_score\n else:\n break\n\n B = np.zeros((n_vars, n_vars))\n for j in range(n_vars):\n for p in parents_list[j]:\n B[j, p] = 1.0 # causal-learn format: target j gets from source p\n\n return B",
6 "after_line": 14,
7 "start_line": 12
8}ERROR: Package 'bench' is not in allowed packages
Tool Input
1{
2 "end_line": 14,
3 "content": " import warnings\n from sklearn.model_selection import KFold\n from sklearn.ensemble import HistGradientBoostingRegressor, RandomForestRegressor\n from sklearn.preprocessing import StandardScaler\n\n warnings.filterwarnings(\"ignore\")\n\n n_samples, n_vars = X.shape\n\n # Standardize data to normalize density scales\n scaler = StandardScaler()\n X_scaled = scaler.fit_transform(X)\n\n # Entropy estimator parameter m\n m_spacing = max(2, int(np.sqrt(n_samples)) // 2)\n\n def calculate_entropy(residuals):\n n = len(residuals)\n if n < 2 * m_spacing + 5:\n return 0.5 * np.log(np.var(residuals) + 1e-8)\n res_sorted = np.sort(residuals)\n diff = res_sorted[2 * m_spacing:] - res_sorted[:-2 * m_spacing]\n diff = np.maximum(diff, 1e-9)\n return np.mean(np.log(diff))\n\n memo = {}\n\n def get_score(target, parents):\n parents_tuple = tuple(sorted(parents))\n if parents_tuple in memo.get(target, {}):\n return memo[target][parents_tuple]\n\n X_target = X_scaled[:, target]\n if len(parents) == 0:\n score = calculate_entropy(X_target - np.mean(X_target))\n else:\n X_features = X_scaled[:, parents]\n kf = KFold(n_splits=5, shuffle=True, random_state=42)\n residuals = np.zeros_like(X_target)\n\n if n_samples < 500:\n model = RandomForestRegressor(n_estimators=30, max_depth=5, random_state=42, n_jobs=1)\n else:\n model = HistGradientBoostingRegressor(max_iter=40, max_depth=5, min_samples_leaf=10, random_state=42)\n\n for train_index, test_index in kf.split(X_features):\n X_train, X_test = X_features[train_index], X_features[test_index]\n y_train, y_test = X_target[train_index], X_target[test_index]\n model.fit(X_train, y_train)\n residuals[test_index] = y_test - model.predict(X_test)\n\n # BIC-like complexity penalty for edges\n penalty = 0.05 * len(parents)\n score = calculate_entropy(residuals) + penalty\n\n if target not in memo:\n memo[target] = {}\n memo[target][parents_tuple] = score\n return score\n\n parents_list = [[] for _ in range(n_vars)]\n current_scores = [get_score(i, []) for i in range(n_vars)]\n\n def check_cycle(p_list):\n visited = [0] * n_vars\n rec_stack = [0] * n_vars\n\n def dfs(node):\n visited[node] = 1\n rec_stack[node] = 1\n for p in p_list[node]:\n if visited[p] == 0:\n if dfs(p): return True\n elif rec_stack[p] == 1:\n return True\n rec_stack[node] = 0\n return False\n\n for i in range(n_vars):\n if visited[i] == 0:\n if dfs(i): return True\n return False\n\n # Forward greedy search\n while True:\n best_reduction = 0\n best_edge = None\n best_edge_score = 0\n\n for p in range(n_vars):\n for j in range(n_vars):\n if p == j or p in parents_list[j]:\n continue\n\n parents_list[j].append(p)\n has_cycle = check_cycle(parents_list)\n parents_list[j].remove(p)\n\n if has_cycle:\n continue\n\n new_parents = parents_list[j] + [p]\n new_score = get_score(j, new_parents)\n reduction = current_scores[j] - new_score\n\n if reduction > best_reduction:\n best_reduction = reduction\n best_edge = (p, j)\n best_edge_score = new_score\n\n if best_reduction > 1e-4:\n p, j = best_edge\n parents_list[j].append(p)\n current_scores[j] = best_edge_score\n else:\n break\n\n # Backward pruning\n while True:\n best_reduction = 0\n best_edge = None\n best_edge_score = 0\n\n for j in range(n_vars):\n for p in parents_list[j]:\n new_parents = parents_list[j].copy()\n new_parents.remove(p)\n\n new_score = get_score(j, new_parents)\n reduction = current_scores[j] - new_score\n\n if reduction > best_reduction:\n best_reduction = reduction\n best_edge = (p, j)\n best_edge_score = new_score\n\n if best_reduction > 1e-4:\n p, j = best_edge\n parents_list[j].remove(p)\n current_scores[j] = best_edge_score\n else:\n break\n\n B = np.zeros((n_vars, n_vars))\n for j in range(n_vars):\n for p in parents_list[j]:\n B[j, p] = 1.0 # causal-learn format: target j gets from source p\n\n return B",
4 "after_line": 14,
5 "filename": "causal-learn/bench/custom_algorithm.py",
6 "op": "replace",
7 "start_line": 12
8}OK: Replaced lines 12..14 with 149 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–160.
[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–160 | total: 160 lines]
1| import numpy as np
2|
3 # =====================================================================
4 # EDITABLE: implement run_causal_discovery below
5 # =====================================================================
6 def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7 """
8 Input: X of shape (n_samples, n_variables)
9 Output: adjacency matrix B of shape (n_variables, n_variables)
10 B[i, j] != 0 means j -> i (follows causal-learn convention)
11 """
12 import warnings
13 from sklearn.model_selection import KFold
14 from sklearn.ensemble import HistGradientBoostingRegressor, RandomForestRegressor
15 from sklearn.preprocessing import StandardScaler
Tool Input
1{
2 "op": "replace",
3 "end_line": 14,
4 "content": "def run_causal_discovery(X: np.ndarray) -> np.ndarray:\n import numpy as np\n from sklearn.preprocessing import SplineTransformer\n from sklearn.linear_model import RidgeCV\n\n n_samples, n_vars = X.shape\n \n # Standardize X\n X = (X - np.mean(X, axis=0)) / (np.std(X, axis=0) + 1e-8)\n \n def check_path(adj, start, end):\n if start == end: return True\n visited = set([start])\n queue = [start]\n while queue:\n node = queue.pop(0)\n for child in range(n_vars):\n if adj[node, child] == 1:\n if child == end:\n return True\n if child not in visited:\n visited.add(child)\n queue.append(child)\n return False\n\n def compute_score(target, parents):\n y = X[:, target]\n if len(parents) == 0:\n var = np.var(y)\n return -n_samples/2 * np.log(max(var, 1e-12))\n \n X_pa = X[:, parents]\n try:\n st = SplineTransformer(n_knots=5, degree=3, include_bias=False)\n X_splines = st.fit_transform(X_pa)\n except Exception:\n X_splines = X_pa\n \n alphas = np.logspace(-3, 3, 10)\n model = RidgeCV(alphas=alphas, store_cv_values=True)\n model.fit(X_splines, y)\n \n cv_values = model.cv_values_\n if cv_values.ndim == 3:\n best_alpha_idx = np.argmin(np.mean(cv_values[:, 0, :], axis=0))\n loo_mse = np.mean(cv_values[:, 0, best_alpha_idx])\n else:\n best_alpha_idx = np.argmin(np.mean(cv_values, axis=0))\n loo_mse = np.mean(cv_values[:, best_alpha_idx])\n \n penalty = len(parents) * np.log(n_samples)\n \n return -n_samples/2 * np.log(max(loo_mse, 1e-12)) - penalty\n\n score_cache = {}\n def get_score(target, parents):\n p_tuple = tuple(sorted(parents))\n if (target, p_tuple) not in score_cache:\n score_cache[(target, p_tuple)] = compute_score(target, parents)\n return score_cache[(target, p_tuple)]\n \n adj = np.zeros((n_vars, n_vars), dtype=int)\n current_scores = np.zeros(n_vars)\n for i in range(n_vars):\n current_scores[i] = get_score(i, [])\n\n while True:\n best_gain = -1e-6\n best_edge = None\n best_score_new = None\n \n for i in range(n_vars):\n for j in range(n_vars):\n if i == j: continue\n if adj[i, j] == 1: continue\n if check_path(adj, j, i): \n continue\n \n pa_j = [p for p in range(n_vars) if adj[p, j] == 1]\n if len(pa_j) >= 6: \n continue\n pa_j.append(i)\n \n new_score = get_score(j, pa_j)\n gain = new_score - current_scores[j]\n \n if gain > best_gain:\n best_gain = gain\n best_edge = (i, j)\n best_score_new = new_score\n \n if best_edge is not None and best_gain > 0:\n i, j = best_edge\n adj[i, j] = 1\n current_scores[j] = best_score_new\n else:\n break\n \n while True:\n best_gain = -1e-6\n best_edge = None\n best_score_new = None\n \n for i in range(n_vars):\n for j in range(n_vars):\n if adj[i, j] == 1:\n pa_j = [p for p in range(n_vars) if adj[p, j] == 1]\n pa_j.remove(i)\n \n new_score = get_score(j, pa_j)\n gain = new_score - current_scores[j]\n \n if gain > best_gain:\n best_gain = gain\n best_edge = (i, j)\n best_score_new = new_score\n \n if best_edge is not None and best_gain > 0:\n i, j = best_edge\n adj[i, j] = 0\n current_scores[j] = best_score_new\n else:\n break\n\n B = adj.T.astype(float)\n return B",
5 "filename": "bench/custom_algorithm.py",
6 "after_line": 3,
7 "start_line": 6
8}ERROR: Package 'bench' is not in allowed packages
Tool Input
1{
2 "filename": "causal-learn/bench/custom_algorithm.py",
3 "start_line": 6,
4 "content": "def run_causal_discovery(X: np.ndarray) -> np.ndarray:\n import numpy as np\n from sklearn.preprocessing import SplineTransformer\n from sklearn.linear_model import RidgeCV\n\n n_samples, n_vars = X.shape\n \n # Standardize X\n X = (X - np.mean(X, axis=0)) / (np.std(X, axis=0) + 1e-8)\n \n def check_path(adj, start, end):\n if start == end: return True\n visited = set([start])\n queue = [start]\n while queue:\n node = queue.pop(0)\n for child in range(n_vars):\n if adj[node, child] == 1:\n if child == end:\n return True\n if child not in visited:\n visited.add(child)\n queue.append(child)\n return False\n\n def compute_score(target, parents):\n y = X[:, target]\n if len(parents) == 0:\n var = np.var(y)\n return -n_samples/2 * np.log(max(var, 1e-12))\n \n X_pa = X[:, parents]\n try:\n st = SplineTransformer(n_knots=5, degree=3, include_bias=False)\n X_splines = st.fit_transform(X_pa)\n except Exception:\n X_splines = X_pa\n \n alphas = np.logspace(-3, 3, 10)\n model = RidgeCV(alphas=alphas, store_cv_values=True)\n model.fit(X_splines, y)\n \n cv_values = model.cv_values_\n if cv_values.ndim == 3:\n best_alpha_idx = np.argmin(np.mean(cv_values[:, 0, :], axis=0))\n loo_mse = np.mean(cv_values[:, 0, best_alpha_idx])\n else:\n best_alpha_idx = np.argmin(np.mean(cv_values, axis=0))\n loo_mse = np.mean(cv_values[:, best_alpha_idx])\n \n penalty = len(parents) * np.log(n_samples)\n \n return -n_samples/2 * np.log(max(loo_mse, 1e-12)) - penalty\n\n score_cache = {}\n def get_score(target, parents):\n p_tuple = tuple(sorted(parents))\n if (target, p_tuple) not in score_cache:\n score_cache[(target, p_tuple)] = compute_score(target, parents)\n return score_cache[(target, p_tuple)]\n \n adj = np.zeros((n_vars, n_vars), dtype=int)\n current_scores = np.zeros(n_vars)\n for i in range(n_vars):\n current_scores[i] = get_score(i, [])\n\n while True:\n best_gain = -1e-6\n best_edge = None\n best_score_new = None\n \n for i in range(n_vars):\n for j in range(n_vars):\n if i == j: continue\n if adj[i, j] == 1: continue\n if check_path(adj, j, i): \n continue\n \n pa_j = [p for p in range(n_vars) if adj[p, j] == 1]\n if len(pa_j) >= 6: \n continue\n pa_j.append(i)\n \n new_score = get_score(j, pa_j)\n gain = new_score - current_scores[j]\n \n if gain > best_gain:\n best_gain = gain\n best_edge = (i, j)\n best_score_new = new_score\n \n if best_edge is not None and best_gain > 0:\n i, j = best_edge\n adj[i, j] = 1\n current_scores[j] = best_score_new\n else:\n break\n \n while True:\n best_gain = -1e-6\n best_edge = None\n best_score_new = None\n \n for i in range(n_vars):\n for j in range(n_vars):\n if adj[i, j] == 1:\n pa_j = [p for p in range(n_vars) if adj[p, j] == 1]\n pa_j.remove(i)\n \n new_score = get_score(j, pa_j)\n gain = new_score - current_scores[j]\n \n if gain > best_gain:\n best_gain = gain\n best_edge = (i, j)\n best_score_new = new_score\n \n if best_edge is not None and best_gain > 0:\n i, j = best_edge\n adj[i, j] = 0\n current_scores[j] = best_score_new\n else:\n break\n\n B = adj.T.astype(float)\n return B",
5 "after_line": 3,
6 "op": "replace",
7 "end_line": 14
8}OK: Replaced lines 6..14 with 126 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–131.
[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–131 | total: 131 lines]
1| import numpy as np
2|
3 # =====================================================================
4 # EDITABLE: implement run_causal_discovery below
5 # =====================================================================
6 def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7 import numpy as np
8 from sklearn.preprocessing import SplineTransformer
9 from sklearn.linear_model import RidgeCV
10
11 n_samples, n_vars = X.shape
12
13 # Standardize X
14 X = (X - np.mean(X, axis=0)) / (np.std(X, axis=0) + 1e-8)
15
16 def check_path(adj, start, end):
17 if start == end: return True
18 visited = set([start])
19 queue = [start]
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)
### ER8-MLP (scripts/eval_er8_mlp.sh)
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/causal-observational-nonlinear/google_gemini-3.1-pro-preview_20260402_183953/causal-learn/bench/run_eval.py", line 63, in <module>
main()
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/causal-observational-nonlinear/google_gemini-3.1-pro-preview_20260402_183953/causal-learn/bench/run_eval.py", line 49, in main
B_est = run_causal_discovery(X)
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/causal-observational-nonlinear/google_gemini-3.1-pro-preview_20260402_183953/causal-learn/bench/custom_algorithm.py", line 89, in run_causal_discovery
new_score = get_score(j, pa_j)
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/causal-observational-nonlinear/google_gemini-3.1-pro-preview_20260402_183953/causal-learn/bench/custom_algorithm.py", line 64, in get_score
score
Tool Input
1{
2 "after_line": 3,
3 "filename": "causal-learn/bench/custom_algorithm.py",
4 "start_line": 6,
5 "end_line": 101,
6 "content": "def run_causal_discovery(X: np.ndarray) -> np.ndarray:\n import numpy as np\n from sklearn.preprocessing import StandardScaler, SplineTransformer\n from sklearn.linear_model import LinearRegression\n\n n_samples, n_vars = X.shape\n \n X = (X - np.mean(X, axis=0)) / (np.std(X, axis=0) + 1e-8)\n \n def check_path(adj, start, end):\n if start == end: return True\n visited = set([start])\n queue = [start]\n while queue:\n node = queue.pop(0)\n for child in range(n_vars):\n if adj[node, child] == 1:\n if child == end:\n return True\n if child not in visited:\n visited.add(child)\n queue.append(child)\n return False\n\n def compute_score(target, parents):\n y = X[:, target]\n if len(parents) == 0:\n var = np.var(y)\n return -n_samples/2 * np.log(max(var, 1e-12))\n \n X_pa = X[:, parents]\n try:\n st = SplineTransformer(n_knots=5, degree=3, include_bias=False)\n X_splines = st.fit_transform(X_pa)\n except Exception:\n X_splines = X_pa\n \n model = LinearRegression()\n model.fit(X_splines, y)\n preds = model.predict(X_splines)\n mse = np.mean((y - preds)**2)\n \n df = X_splines.shape[1]\n penalty = df * np.log(n_samples) / 2\n \n return -n_samples/2 * np.log(max(mse, 1e-12)) - penalty\n\n score_cache = {}\n def get_score(target, parents):\n p_tuple = tuple(sorted(parents))\n if (target, p_tuple) not in score_cache:\n score_cache[(target, p_tuple)] = compute_score(target, parents)\n return score_cache[(target, p_tuple)]\n \n adj = np.zeros((n_vars, n_vars), dtype=int)\n current_scores = np.zeros(n_vars)\n for i in range(n_vars):\n current_scores[i] = get_score(i, [])\n\n while True:\n best_gain = -1e-6\n best_edge = None\n best_score_new = None\n \n for i in range(n_vars):\n for j in range(n_vars):\n if i == j: continue\n if adj[i, j] == 1: continue\n if check_path(adj, j, i): \n continue\n \n pa_j = [p for p in range(n_vars) if adj[p, j] == 1]\n if len(pa_j) >= 6: \n continue\n pa_j.append(i)\n \n new_score = get_score(j, pa_j)\n gain = new_score - current_scores[j]\n \n if gain > best_gain:\n best_gain = gain\n best_edge = (i, j)\n best_score_new = new_score\n \n if best_edge is not None and best_gain > 0:\n i, j = best_edge\n adj[i, j] = 1\n current_scores[j] = best_score_new\n else:\n break\n \n while True:\n best_gain = -1e-6\n best_edge = None\n best_score_new = None\n \n for i in range(n_vars):\n for j in range(n_vars):\n if adj[i, j] == 1:\n pa_j = [p for p in range(n_vars) if adj[p, j] == 1]\n pa_j.remove(i)\n \n new_score = get_score(j, pa_j)\n gain = new_score - current_scores[j]\n \n if gain > best_gain:\n best_gain = gain\n best_edge = (i, j)\n best_score_new = new_score\n \n if best_edge is not None and best_gain > 0:\n i, j = best_edge\n adj[i, j] = 0\n current_scores[j] = best_score_new\n else:\n break\n\n B = adj.T.astype(float)\n return B",
7 "op": "replace"
8}OK: Replaced lines 6..101 with 119 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–154.
[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–154 | total: 154 lines]
1| import numpy as np
2|
3 # =====================================================================
4 # EDITABLE: implement run_causal_discovery below
5 # =====================================================================
6 def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7 import numpy as np
8 from sklearn.preprocessing import StandardScaler, SplineTransformer
9 from sklearn.linear_model import LinearRegression
10
11 n_samples, n_vars = X.shape
12
13 X = (X - np.mean(X, axis=0)) / (np.std(X, axis=0) + 1e-8)
14
15 def check_path(adj, start, end):
16 if start == end: return True
17 visited = set([start])
18 queue = [start]
[Test #2] (1 test remaining; call submit(n=2) to submit this result)
### ER8-MLP (scripts/eval_er8_mlp.sh)
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/causal-observational-nonlinear/google_gemini-3.1-pro-preview_20260402_183953/causal-learn/bench/run_eval.py", line 10, in <module>
from custom_algorithm import run_causal_discovery
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/causal-observational-nonlinear/google_gemini-3.1-pro-preview_20260402_183953/causal-learn/bench/custom_algorithm.py", line 125
break
IndentationError: unexpected indent
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/causal-observational-nonlinear/google_gemini-3.1-pro-preview_20260402_183953/.local_scripts/scripts/eval_er8_mlp.sh` failed. (See above for error)
### ER12-GP (scripts/eval_er12_gp.sh)
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/causal-observa
Tool Input
1{
2 "end_line": 98,
3 "op": "replace",
4 "content": "def run_causal_discovery(X: np.ndarray) -> np.ndarray:\n import numpy as np\n from sklearn.linear_model import LinearRegression\n\n n_samples, n_vars = X.shape\n \n # Standardize\n X = (X - np.mean(X, axis=0)) / (np.std(X, axis=0) + 1e-8)\n \n def check_path(adj, start, end):\n if start == end: return True\n visited = set([start])\n queue = [start]\n while queue:\n node = queue.pop(0)\n for child in range(n_vars):\n if adj[node, child] == 1:\n if child == end:\n return True\n if child not in visited:\n visited.add(child)\n queue.append(child)\n return False\n\n def compute_score(target, parents):\n y = X[:, target]\n if len(parents) == 0:\n var = np.var(y)\n return -n_samples/2 * np.log(max(var, 1e-12))\n \n X_pa = X[:, parents]\n try:\n from sklearn.preprocessing import SplineTransformer\n st = SplineTransformer(n_knots=5, degree=3, include_bias=False)\n X_feats = st.fit_transform(X_pa)\n except ImportError:\n from sklearn.preprocessing import PolynomialFeatures\n pf = PolynomialFeatures(degree=3, include_bias=False)\n X_feats = pf.fit_transform(X_pa)\n \n model = LinearRegression()\n model.fit(X_feats, y)\n preds = model.predict(X_feats)\n mse = np.mean((y - preds)**2)\n \n df = X_feats.shape[1]\n penalty = df * np.log(n_samples) / 2\n \n return -n_samples/2 * np.log(max(mse, 1e-12)) - penalty\n\n score_cache = {}\n def get_score(target, parents):\n p_tuple = tuple(sorted(parents))\n if (target, p_tuple) not in score_cache:\n score_cache[(target, p_tuple)] = compute_score(target, parents)\n return score_cache[(target, p_tuple)]\n \n adj = np.zeros((n_vars, n_vars), dtype=int)\n current_scores = np.zeros(n_vars)\n for i in range(n_vars):\n current_scores[i] = get_score(i, [])\n\n while True:\n best_gain = -1e-6\n best_edge = None\n best_score_new = None\n \n for i in range(n_vars):\n for j in range(n_vars):\n if i == j: continue\n if adj[i, j] == 1: continue\n if check_path(adj, j, i): \n continue\n \n pa_j = [p for p in range(n_vars) if adj[p, j] == 1]\n if len(pa_j) >= 6: \n continue\n pa_j.append(i)\n \n new_score = get_score(j, pa_j)\n gain = new_score - current_scores[j]\n \n if gain > best_gain:\n best_gain = gain\n best_edge = (i, j)\n best_score_new = new_score\n \n if best_edge is not None and best_gain > 0:\n i, j = best_edge\n adj[i, j] = 1\n current_scores[j] = best_score_new\n else:\n break\n \n # Pruning phase\n while True:\n best_gain = -1e-6\n best_edge = None\n best_score_new = None\n \n for i in range(n_vars):\n for j in range(n_vars):\n if adj[i, j] == 1:\n pa_j = [p for p in range(n_vars) if adj[p, j] == 1]\n pa_j.remove(i)\n \n new_score = get_score(j, pa_j)\n gain = new_score - current_scores[j]\n \n if gain > best_gain:\n best_gain = gain\n best_edge = (i, j)\n best_score_new = new_score\n \n if best_edge is not None and best_gain > 0:\n i, j = best_edge\n adj[i, j] = 0\n current_scores[j] = best_score_new\n else:\n break\n\n # causallearn convention: B[i, j] != 0 means j -> i\n # Our adj[i, j] == 1 means i -> j. So B = adj.T\n B = adj.T.astype(float)\n return B",
5 "after_line": 3,
6 "start_line": 6,
7 "filename": "causal-learn/bench/custom_algorithm.py"
8}OK: Replaced lines 6..98 with 125 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–186.
[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–186 | total: 186 lines]
1| import numpy as np
2|
3 # =====================================================================
4 # EDITABLE: implement run_causal_discovery below
5 # =====================================================================
6 def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7 import numpy as np
8 from sklearn.linear_model import LinearRegression
9
10 n_samples, n_vars = X.shape
11
12 # Standardize
13 X = (X - np.mean(X, axis=0)) / (np.std(X, axis=0) + 1e-8)
14
15 def check_path(adj, start, end):
16 if start == end: return True
17 visited = set([start])
18 queue = [start]
19 while queue:
20 node = qu
[Test #3] (0 tests remaining)
### ER8-MLP (scripts/eval_er8_mlp.sh)
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/causal-observational-nonlinear/google_gemini-3.1-pro-preview_20260402_183953/causal-learn/bench/run_eval.py", line 10, in <module>
from custom_algorithm import run_causal_discovery
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/causal-observational-nonlinear/google_gemini-3.1-pro-preview_20260402_183953/causal-learn/bench/custom_algorithm.py", line 131
best_edge = None
IndentationError: unexpected indent
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/causal-observational-nonlinear/google_gemini-3.1-pro-preview_20260402_183953/.local_scripts/scripts/eval_er8_mlp.sh` failed. (See above for error)
### ER12-GP (scripts/eval_er12_gp.sh)
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/causal-observational-nonlinear/google_gemi