Agent Conversation: gpt-5.4-pro
Task: causal-observational-nonlinear
Causal Discovery: Observational Nonlinear Data
Objective
Implement a causal discovery algorithm that recovers the DAG structure from purely observational data generated by a nonlinear Additive Noise Model (ANM). Your code goes in bench/custom_algorithm.py.
Background
Additive Noise Models (ANMs) assume structural equations of the form x_j = f_j(parents(j)) + e_j, where f_j are nonlinear functions and e_j are independent noise terms. Under mild conditions on f_j and the noise distribution, the causal DAG is identifiable from observational data alone (Hoyer et al., 2008; Peters et al., 2014). This goes beyond linear methods (LiNGAM, PC, GES) that assume linearity.
Key challenges include: (1) testing independence in high-dimensional nonlinear settings, (2) scaling to larger graphs, (3) handling diverse nonlinear function types (MLP, GP, polynomial, sigmoid), and (4) robustness to different noise distributions including the harder Gaussian noise case.
Evaluation Scenarios
| Label | Graph type | Nodes | Samples | Noise | Nonlinearity |
|---|---|---|---|---|---|
| ER8-MLP | Erdos-Renyi | 8 | 500 | Exponential | MLP |
| ER12-GP | Erdos-Renyi | 12 | 1000 | Laplace | GP |
| SF10-Mixed | Scale-Free (BA) | 10 | 500 | Uniform | Mixed |
| ER15-Sigmoid | Erdos-Renyi | 15 | 1000 | Exponential | Sigmoid |
| ER20-Mixed | Erdos-Renyi | 20 | 2000 | Laplace | Mixed |
| SF20-GP | Scale-Free (BA) | 20 | 2000 | Exponential | GP |
| ER12-LowSample | Erdos-Renyi | 12 | 150 | Laplace | Mixed |
| ER20-Gauss | Erdos-Renyi | 20 | 2000 | Gaussian | Mixed |
Metrics
All computed on the directed edge set (skeleton + direction must be correct):
- F1 (primary ranking metric), SHD, Precision, Recall
Baselines
cam: Causal Additive Models (Buehlmann et al., 2014) -- score-based with GAM regressionnotears_mlp: NOTEARS with MLP parametrization (Zheng et al., 2020) -- continuous optimizationdirectlingam: DirectLiNGAM (Shimizu et al., 2011) -- linear reference baseline
causal-learn/bench/custom_algorithm.py [EDITABLE — lines 3–14 only]
1: import numpy as np
2:
3: # =====================================================================
4: # EDITABLE: implement run_causal_discovery below
5: # =====================================================================
6: def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7: """
8: Input: X of shape (n_samples, n_variables)
9: Output: adjacency matrix B of shape (n_variables, n_variables)
10: B[i, j] != 0 means j -> i (follows causal-learn convention)
11: """
12: n = X.shape[1]
13: return np.zeros((n, n))
14: # =====================================================================
cam baseline — editable region [READ-ONLY — reference implementation]
Lines 3–14:
1: import numpy as nTool Input
1{
2 "op": "replace",
3 "filename": "causal-learn/bench/custom_algorithm.py",
4 "content": "# =====================================================================\n# EDITABLE: implement run_causal_discovery below\n# =====================================================================\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n \"\"\"\n Input: X of shape (n_samples, n_variables)\n Output: adjacency matrix B of shape (n_variables, n_variables)\n B[i, j] != 0 means j -> i (follows causal-learn convention)\n \"\"\"\n import os\n from itertools import combinations\n from sklearn.ensemble import ExtraTreesRegressor, HistGradientBoostingRegressor\n from sklearn.feature_selection import mutual_info_regression\n from sklearn.model_selection import KFold\n\n # Keep thread usage low when multiple seeds are evaluated in parallel.\n os.environ.setdefault(\"OMP_NUM_THREADS\", \"1\")\n os.environ.setdefault(\"OPENBLAS_NUM_THREADS\", \"1\")\n os.environ.setdefault(\"MKL_NUM_THREADS\", \"1\")\n\n X = np.asarray(X, dtype=float)\n n, d = X.shape\n if d == 0:\n return np.zeros((0, 0))\n if d == 1:\n return np.zeros((1, 1))\n\n # Standardize for stable nonlinear regression / dependence testing.\n X = X.copy()\n X -= X.mean(axis=0, keepdims=True)\n scale = X.std(axis=0, keepdims=True)\n scale[scale < 1e-8] = 1.0\n X /= scale\n\n rng = np.random.default_rng(0)\n dep_idx = np.arange(n) if n <= 384 else rng.choice(n, size=384, replace=False)\n obj_cache = {}\n\n def make_model() -> HistGradientBoostingRegressor:\n return HistGradientBoostingRegressor(\n loss=\"squared_error\",\n learning_rate=0.06,\n max_iter=60 if n >= 500 else 100,\n max_depth=3,\n min_samples_leaf=max(5, n // 80),\n l2_regularization=1e-3,\n random_state=0,\n )\n\n def crossfit_residual(y: np.ndarray, Z: np.ndarray) -> np.ndarray:\n if Z.ndim == 1:\n Z = Z[:, None]\n if Z.shape[1] == 0:\n return y - y.mean()\n\n resid = np.zeros_like(y, dtype=float)\n kf = KFold(n_splits=2, shuffle=True, random_state=0)\n for tr, te in kf.split(Z):\n model = make_model()\n model.fit(Z[tr], y[tr])\n resid[te] = y[te] - model.predict(Z[te])\n return resid\n\n def dep_score(Z: np.ndarray, r: np.ndarray) -> float:\n if Z.ndim == 1:\n Z = Z[:, None]\n if Z.shape[1] == 0:\n return 0.0\n\n Zs = Z[dep_idx]\n rs = r[dep_idx]\n rs = rs - rs.mean()\n rs_std = float(rs.std())\n if rs_std < 1e-10:\n return 0.0\n rs = rs / rs_std\n\n try:\n mi = mutual_info_regression(\n Zs,\n rs,\n discrete_features=False,\n n_neighbors=5 if len(dep_idx) >= 200 else 3,\n random_state=0,\n )\n except Exception:\n mi = np.zeros(Zs.shape[1], dtype=float)\n\n corr = np.zeros(Zs.shape[1], dtype=float)\n for k in range(Zs.shape[1]):\n zk = Zs[:, k]\n if np.std(zk) < 1e-10:\n continue\n c = np.corrcoef(zk, rs)[0, 1]\n corr[k] = 0.0 if not np.isfinite(c) else abs(c)\n\n scores = np.nan_to_num(mi, nan=0.0, posinf=0.0, neginf=0.0) + 0.35 * corr\n topk = min(3, len(scores))\n if topk == 0:\n return 0.0\n top = np.partition(scores, len(scores) - topk)[-topk:]\n return float(0.7 * np.max(top) + 0.3 * np.mean(top))\n\n def objective(target: int, parents: tuple, context: tuple) -> float:\n key = (target, parents, context)\n if key in obj_cache:\n return obj_cache[key]\n\n y = X[:, target]\n resid = crossfit_residual(y, X[:, parents]) if len(parents) > 0 else (y - y.mean())\n dep = dep_score(X[:, context], resid) if len(context) > 0 else 0.0\n mse = float(np.mean(resid ** 2))\n # Dependence is primary; fit and tiny complexity penalty stabilize selection.\n value = dep + 0.12 * mse + 0.01 * len(parents) / max(1, len(context) + 1)\n obj_cache[key] = value\n return value\n\n # -----------------------------------------------------------------\n # Stage 1: estimate a topological order by repeated sink peeling.\n # Novelty: out-of-fold residual-independence scoring prevents overfit.\n # -----------------------------------------------------------------\n remaining = list(range(d))\n reverse_order = []\n while remaining:\n if len(remaining) == 1:\n reverse_order.append(remaining.pop())\n break\n\n best_node = None\n best_score = np.inf\n for j in remaining:\n predictors = tuple(k for k in remaining if k != j)\n resid = crossfit_residual(X[:, j], X[:, predictors])\n score = dep_score(X[:, predictors], resid) + 0.12 * float(np.mean(resid ** 2))\n if score < best_score:\n best_score = score\n best_node = j\n\n reverse_order.append(best_node)\n remaining.remove(best_node)\n\n order = reverse_order[::-1]\n\n # -----------------------------------------------------------------\n # Stage 2: order-constrained parent selection.\n # Novelty: knockoff-ranked nonlinear search + pairwise-synergy insertion.\n # -----------------------------------------------------------------\n B = np.zeros((d, d), dtype=float)\n min_gain = 0.006 + 0.02 / np.sqrt(max(n, 2))\n\n for pos, target in enumerate(order):\n context = tuple(order[:pos])\n if len(context) == 0:\n continue\n\n local_rng = np.random.default_rng(1234 + target)\n Z_real = X[:, context]\n Z_fake = np.column_stack([local_rng.permutation(Z_real[:, i]) for i in range(Z_real.shape[1])])\n ranker = ExtraTreesRegressor(\n n_estimators=96 if n < 1000 else 64,\n max_depth=5,\n min_samples_leaf=max(2, n // 150),\n random_state=0,\n n_jobs=1,\n )\n ranker.fit(np.hstack([Z_real, Z_fake]), X[:, target])\n imp = ranker.feature_importances_\n comp = imp[: len(context)] - imp[len(context) :]\n order_idx = np.argsort(-comp)\n cand_order = [context[i] for i in order_idx]\n\n selected = tuple()\n current = objective(target, selected, context)\n\n while True:\n best_set = None\n best_value = current\n\n # Best single addition.\n for cand in cand_order:\n if cand in selected:\n continue\n trial = tuple(sorted(selected + (cand,)))\n value = objective(target, trial, context)\n if value < best_value:\n best_value = value\n best_set = trial\n\n # Best pair addition among the strongest remaining candidates.\n top_left = [c for c in cand_order if c not in selected][:5]\n for a, b in combinations(top_left, 2):\n trial = tuple(sorted(selected + (a, b)))\n value = objective(target, trial, context)\n if value < best_value:\n best_value = value\n best_set = trial\n\n if best_set is None or current - best_value <= min_gain:\n break\n selected, current = best_set, best_value\n\n # Backward cleanup.\n changed = True\n while changed and len(selected) > 0:\n changed = False\n for cand in list(selected):\n trial = tuple(x for x in selected if x != cand)\n value = objective(target, trial, context)\n if value <= current + 0.003:\n selected, current = trial, value\n changed = True\n break\n\n for parent in selected:\n B[target, parent] = 1.0\n\n return B\n# =====================================================================",
5 "after_line": 14,
6 "start_line": 3,
7 "end_line": 14
8}OK: Replaced lines 3..14 with 218 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–220.
[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–220 | total: 220 lines]
1| import numpy as np
2|
3 # =====================================================================
4 # EDITABLE: implement run_causal_discovery below
5 # =====================================================================
6 def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7 """
8 Input: X of shape (n_samples, n_variables)
9 Output: adjacency matrix B of shape (n_variables, n_variables)
10 B[i, j] != 0 means j -> i (follows causal-learn convention)
11 """
12 import os
13 from itertools import combinations
14 from sklearn.ensemble import ExtraTreesRegressor, HistGradientBoostingRegressor
15 from sklearn.feature_selection import mutual_info_regression
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)
### ER8-MLP (scripts/eval_er8_mlp.sh)
Results (ER8-MLP):
SHD=6 F1=0.6316 Precision=0.5000 Recall=0.8571
### ER12-GP (scripts/eval_er12_gp.sh)
Results (ER12-GP):
SHD=17 F1=0.6538 Precision=0.5152 Recall=0.8947
### SF10-Mixed (scripts/eval_sf10_mixed.sh)
Results (SF10-Mixed):
SHD=13 F1=0.6341 Precision=0.5200 Recall=0.8125
### ER15-Sigmoid (scripts/eval_er15_sigmoid.sh)
Results (ER15-Sigmoid):
SHD=29 F1=0.5867 Precision=0.4490 Recall=0.8462
### ER20-Mixed (scripts/eval_er20_mixed.sh)
Results (ER20-Mixed):
SHD=58 F1=0.3860 Precision=0.3929 Recall=0.3793
### SF20-GP (scripts/eval_sf20_gp.sh)
Results (SF20-GP):
SHD=37 F1=0.5567 Precision=0.4426 Recall=0.7500
### ER12-LowSample (scripts/eval_er12_lowsample.sh)
Results (ER12-LowSample):
SHD=17 F1=0.5652 Precision=0.4815 Recall=0.6842
[Leaderboard] Results saved: {'shd_ER8-MLP': 6, 'f1_ER8-MLP': 0.6316, 'precision_ER8-MLP': 0.5, 're
Tool Input
1{
2 "op": "replace",
3 "filename": "causal-learn/bench/custom_algorithm.py",
4 "content": "# =====================================================================\n# EDITABLE: implement run_causal_discovery below\n# =====================================================================\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n \"\"\"\n Input: X of shape (n_samples, n_variables)\n Output: adjacency matrix B of shape (n_variables, n_variables)\n B[i, j] != 0 means j -> i (follows causal-learn convention)\n \"\"\"\n import os\n from itertools import combinations\n from sklearn.ensemble import ExtraTreesRegressor, HistGradientBoostingRegressor\n from sklearn.feature_selection import mutual_info_regression\n from sklearn.model_selection import KFold\n\n os.environ.setdefault(\"OMP_NUM_THREADS\", \"1\")\n os.environ.setdefault(\"OPENBLAS_NUM_THREADS\", \"1\")\n os.environ.setdefault(\"MKL_NUM_THREADS\", \"1\")\n\n X = np.asarray(X, dtype=float)\n n, d = X.shape\n if d == 0:\n return np.zeros((0, 0))\n if d == 1:\n return np.zeros((1, 1))\n\n X = X.copy()\n X -= X.mean(axis=0, keepdims=True)\n scale = X.std(axis=0, keepdims=True)\n scale[scale < 1e-8] = 1.0\n X /= scale\n\n rng = np.random.default_rng(0)\n dep_idx = np.arange(n) if n <= 256 else rng.choice(n, size=256, replace=False)\n\n resid_cache = {}\n obj_cache = {}\n cond_cache = {}\n pair_cache = {}\n\n def make_model() -> HistGradientBoostingRegressor:\n return HistGradientBoostingRegressor(\n loss=\"squared_error\",\n learning_rate=0.06,\n max_iter=50 if n >= 1000 else (70 if n >= 400 else 100),\n max_depth=3,\n min_samples_leaf=max(5, n // 80),\n l2_regularization=1e-3,\n random_state=0,\n )\n\n def crossfit_residual(y: np.ndarray, Z: np.ndarray) -> np.ndarray:\n if Z.ndim == 1:\n Z = Z[:, None]\n if Z.shape[1] == 0:\n return y - y.mean()\n\n resid = np.zeros_like(y, dtype=float)\n kf = KFold(n_splits=2, shuffle=True, random_state=0)\n for tr, te in kf.split(Z):\n model = make_model()\n model.fit(Z[tr], y[tr])\n resid[te] = y[te] - model.predict(Z[te])\n return resid\n\n def get_resid(var: int, cond: tuple) -> np.ndarray:\n cond = tuple(sorted(cond))\n key = (var, cond)\n if key not in resid_cache:\n resid_cache[key] = crossfit_residual(X[:, var], X[:, cond]) if len(cond) > 0 else (X[:, var] - X[:, var].mean())\n return resid_cache[key]\n\n def dep_score(Z: np.ndarray, r: np.ndarray) -> float:\n if Z.ndim == 1:\n Z = Z[:, None]\n if Z.shape[1] == 0:\n return 0.0\n\n Zs = Z[dep_idx]\n rs = r[dep_idx]\n rs = rs - rs.mean()\n rs_std = float(rs.std())\n if rs_std < 1e-10:\n return 0.0\n rs = rs / rs_std\n\n try:\n mi = mutual_info_regression(\n Zs,\n rs,\n discrete_features=False,\n n_neighbors=5 if len(dep_idx) >= 160 else 3,\n random_state=0,\n )\n except Exception:\n mi = np.zeros(Zs.shape[1], dtype=float)\n\n corr = np.zeros(Zs.shape[1], dtype=float)\n nonlin = np.zeros(Zs.shape[1], dtype=float)\n rs2 = np.tanh(rs)\n for k in range(Zs.shape[1]):\n zk = Zs[:, k]\n zk_std = float(np.std(zk))\n if zk_std < 1e-10:\n continue\n c1 = np.corrcoef(zk, rs)[0, 1]\n c2 = np.corrcoef(np.tanh(zk), rs)[0, 1]\n c3 = np.corrcoef(zk ** 2, rs2)[0, 1]\n corr[k] = 0.0 if not np.isfinite(c1) else abs(c1)\n nonlin[k] = max(0.0 if not np.isfinite(c2) else abs(c2), 0.0 if not np.isfinite(c3) else abs(c3))\n\n scores = np.nan_to_num(mi, nan=0.0, posinf=0.0, neginf=0.0) + 0.25 * corr + 0.15 * nonlin\n topk = min(3, len(scores))\n if topk == 0:\n return 0.0\n top = np.partition(scores, len(scores) - topk)[-topk:]\n return float(0.65 * np.max(top) + 0.35 * np.mean(top))\n\n def objective(target: int, parents: tuple, context: tuple) -> float:\n parents = tuple(sorted(parents))\n context = tuple(context)\n key = (target, parents, context)\n if key in obj_cache:\n return obj_cache[key]\n\n resid = get_resid(target, parents)\n dep = dep_score(X[:, context], resid) if len(context) > 0 else 0.0\n mse = float(np.mean(resid ** 2))\n complexity = len(parents) * (0.004 + 1.5 * np.log(max(n, 3)) / max(n, 1))\n value = dep + 0.10 * mse + complexity\n obj_cache[key] = value\n return value\n\n def cond_link_score(target: int, cand: int, cond: tuple) -> tuple:\n cond = tuple(sorted(cond))\n key = (target, cand, cond)\n if key in cond_cache:\n return cond_cache[key]\n\n ry = get_resid(target, cond)\n rx = get_resid(cand, cond)\n real = dep_score(rx[:, None], ry)\n perm_rng = np.random.default_rng(2027 + 97 * target + 31 * cand + 7 * len(cond))\n fake = dep_score(rx[perm_rng.permutation(n)][:, None], ry)\n cond_cache[key] = (real, fake)\n return real, fake\n\n def pair_pref(a: int, b: int) -> float:\n key = (a, b)\n if key in pair_cache:\n return pair_cache[key]\n\n s_ab = dep_score(X[:, [a]], get_resid(b, (a,)))\n s_ba = dep_score(X[:, [b]], get_resid(a, (b,)))\n pref = s_ba - s_ab # positive => a is more likely before b\n pair_cache[(a, b)] = pref\n pair_cache[(b, a)] = -pref\n return pref\n\n # Stage 1: sink peeling with pairwise ANM vote regularization.\n remaining = list(range(d))\n reverse_order = []\n while remaining:\n if len(remaining) == 1:\n reverse_order.append(remaining.pop())\n break\n\n best_node = None\n best_score = np.inf\n for j in remaining:\n others = tuple(k for k in remaining if k != j)\n resid = get_resid(j, others)\n score = dep_score(X[:, others], resid) + 0.10 * float(np.mean(resid ** 2))\n if len(others) > 1:\n score += 0.28 * float(np.mean([max(0.0, pair_pref(j, k)) for k in others]))\n if score < best_score:\n best_score = score\n best_node = j\n\n reverse_order.append(best_node)\n remaining.remove(best_node)\n\n order = reverse_order[::-1]\n\n # Stage 2: order-constrained parent search with conditional falsification.\n B = np.zeros((d, d), dtype=float)\n base_thr = 0.008 + 0.12 / np.sqrt(max(n, 2))\n\n for pos, target in enumerate(order):\n context = tuple(order[:pos])\n if len(context) == 0:\n continue\n\n local_rng = np.random.default_rng(1234 + target)\n Z_real = X[:, context]\n Z_fake = np.column_stack([local_rng.permutation(Z_real[:, i]) for i in range(Z_real.shape[1])])\n ranker = ExtraTreesRegressor(\n n_estimators=80 if n < 1000 else 64,\n max_depth=5,\n min_samples_leaf=max(2, n // 150),\n random_state=0,\n n_jobs=1,\n )\n ranker.fit(np.hstack([Z_real, Z_fake]), X[:, target])\n imp = ranker.feature_importances_\n comp = imp[: len(context)] - imp[len(context) :]\n\n rank_scores = []\n for i, cand in enumerate(context):\n pair_bonus = max(0.0, pair_pref(cand, target))\n uni = dep_score(X[:, [cand]], X[:, target])\n rank_scores.append(comp[i] + 0.35 * pair_bonus + 0.15 * uni)\n rank_scores = np.asarray(rank_scores)\n cand_order = [context[i] for i in np.argsort(-rank_scores)]\n\n screen_cap = min(len(context), max(4, int(np.ceil(2.0 * np.sqrt(len(context)) + 1))))\n cand_order = cand_order[:screen_cap]\n max_parents = min(screen_cap, max(2, int(np.ceil(1.2 * np.sqrt(len(context)) + 1))))\n\n selected = tuple()\n current = objective(target, selected, context)\n\n while len(selected) < max_parents:\n best_set = None\n best_value = current\n best_metric = base_thr\n single_stats = {}\n\n for cand in cand_order:\n if cand in selected:\n continue\n real, fake = cond_link_score(target, cand, selected)\n trial = tuple(sorted(selected + (cand,)))\n value = objective(target, trial, context)\n gain = current - value\n metric = (real - 1.10 * fake) + 0.90 * gain + 0.08 * max(0.0, pair_pref(cand, target))\n single_stats[cand] = (metric, value)\n if metric > best_metric:\n best_metric = metric\n best_set = trial\n best_value = value\n\n top_left = [c for c in cand_order if c not in selected]\n top_left = sorted(top_left, key=lambda c: single_stats.get(c, (-1e9, np.inf))[0], reverse=True)[:4]\n for a, b in combinations(top_left, 2):\n trial = tuple(sorted(selected + (a, b)))\n value = objective(target, trial, context)\n gain = current - value\n ra, fa = cond_link_score(target, a, selected)\n rb, fb = cond_link_score(target, b, selected)\n metric = gain + 0.60 * ((ra - fa) + (rb - fb))\n if metric > best_metric + 0.002:\n best_metric = metric\n best_set = trial\n best_value = value\n\n if best_set is None:\n break\n selected, current = best_set, best_value\n\n changed = True\n while changed and len(selected) > 0:\n changed = False\n full_value = objective(target, selected, context)\n for cand in list(selected):\n others = tuple(x for x in selected if x != cand)\n real, fake = cond_link_score(target, \n... (truncated)",
5 "after_line": 5,
6 "start_line": 3,
7 "end_line": 220
8}OK: Replaced lines 3..220 with 283 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–285.
[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–285 | total: 285 lines]
1| import numpy as np
2|
3 # =====================================================================
4 # EDITABLE: implement run_causal_discovery below
5 # =====================================================================
6 def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7 """
8 Input: X of shape (n_samples, n_variables)
9 Output: adjacency matrix B of shape (n_variables, n_variables)
10 B[i, j] != 0 means j -> i (follows causal-learn convention)
11 """
12 import os
13 from itertools import combinations
14 from sklearn.ensemble import ExtraTreesRegressor, HistGradientBoostingRegressor
15 from sklearn.feature_selection import mutual_info_regression
[Test #2] (1 test remaining; call submit(n=2) to submit this result)
### ER8-MLP (scripts/eval_er8_mlp.sh)
Results (ER8-MLP):
SHD=6 F1=0.4000 Precision=0.3750 Recall=0.4286
### ER12-GP (scripts/eval_er12_gp.sh)
Results (ER12-GP):
SHD=16 F1=0.3871 Precision=0.5000 Recall=0.3158
### SF10-Mixed (scripts/eval_sf10_mixed.sh)
Results (SF10-Mixed):
SHD=10 F1=0.6667 Precision=0.6471 Recall=0.6875
### ER15-Sigmoid (scripts/eval_er15_sigmoid.sh)
Results (ER15-Sigmoid):
SHD=10 F1=0.7719 Precision=0.7097 Recall=0.8462
### ER20-Mixed (scripts/eval_er20_mixed.sh)
Results (ER20-Mixed):
SHD=54 F1=0.3366 Precision=0.3953 Recall=0.2931
### SF20-GP (scripts/eval_sf20_gp.sh)
Results (SF20-GP):
SHD=22 F1=0.6579 Precision=0.6250 Recall=0.6944
### ER12-LowSample (scripts/eval_er12_lowsample.sh)
Results (ER12-LowSample):
SHD=11 F1=0.5517 Precision=0.8000 Recall=0.4211
[Leaderboard] Results saved: {'shd_ER8-MLP': 6, 'f1_ER8-MLP': 0.4, 'precision_ER8-MLP': 0.375, 'reca
Tool Input
1{
2 "n": 1
3}[submit] Submitting result from test #1 as final. ### ER8-MLP (scripts/eval_er8_mlp.sh) Results (ER8-MLP): SHD=6 F1=0.6316 Precision=0.5000 Recall=0.8571 ### ER12-GP (scripts/eval_er12_gp.sh) Results (ER12-GP): SHD=17 F1=0.6538 Precision=0.5152 Recall=0.8947 ### SF10-Mixed (scripts/eval_sf10_mixed.sh) Results (SF10-Mixed): SHD=13 F1=0.6341 Precision=0.5200 Recall=0.8125 ### ER15-Sigmoid (scripts/eval_er15_sigmoid.sh) Results (ER15-Sigmoid): SHD=29 F1=0.5867 Precision=0.4490 Recall=0.8462 ### ER20-Mixed (scripts/eval_er20_mixed.sh) Results (ER20-Mixed): SHD=58 F1=0.3860 Precision=0.3929 Recall=0.3793 ### SF20-GP (scripts/eval_sf20_gp.sh) Results (SF20-GP): SHD=37 F1=0.5567 Precision=0.4426 Recall=0.7500 ### ER12-LowSample (scripts/eval_er12_lowsample.sh) Results (ER12-LowSample): SHD=17 F1=0.5652 Precision=0.4815 Recall=0.6842 ### ER20-Gauss (scripts/eval_er20_gauss.sh) Results (ER20-Gauss): SHD=62 F1=0.3871 Precision=0.3636 Recall=0.4138