Agent Conversation: gpt-5.4-pro
Model: gpt-5.4-pro
Total steps: 5
Messages: 11
UserStep 0
Task: causal-observational-linear-non-gaussian
Causal Discovery: Observational Linear Non-Gaussian Data
Objective
Implement a causal discovery algorithm that recovers the DAG structure from purely observational data generated by a Linear Non-Gaussian Acyclic Model (LiNGAM). Your code goes in bench/custom_algorithm.py.
Background
LiNGAM-based methods exploit non-Gaussian noise to achieve full DAG identifiability from observational data alone, going beyond the Markov Equivalence Class limit of constraint-based (PC) and score-based (GES) methods.
Evaluation Scenarios
| Label | Graph type | Nodes | Samples | Noise |
|---|---|---|---|---|
| ER10 | Erdos-Renyi | 10 | 250 | Exponential |
| ER15 | Erdos-Renyi | 15 | 500 | Laplace |
| SF12 | Scale-Free (BA) | 12 | 300 | Uniform |
| ER30 | Erdos-Renyi | 30 | 1000 | Laplace |
| ER50 | Erdos-Renyi | 50 | 2000 | Exponential |
| ER50-LowSample | Erdos-Renyi | 50 | 250 | Exponential |
| SF100 | Scale-Free (BA) | 100 | 1000 | Uniform |
| ER20-Dense | Erdos-Renyi | 20 | 500 | Laplace |
Metrics
All computed on the directed edge set (skeleton + direction must be correct):
- F1 (primary ranking metric), SHD, Precision, Recall
Baselines
icalingam: ICA-based LiNGAM (Shimizu 2006)directlingam: DirectLiNGAM (Shimizu 2011)
causal-learn/bench/custom_algorithm.py [EDITABLE — lines 3–14 only]
1: import numpy as np
2:
3: # =====================================================================
4: # EDITABLE: implement run_causal_discovery below
5: # =====================================================================
6: def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7: """
8: Input: X of shape (n_samples, n_variables)
9: Output: adjacency matrix B of shape (n_variables, n_variables)
10: B[i, j] != 0 means j -> i (follows causal-learn convention)
11: """
12: n = X.shape[1]
13: return np.zeros((n, n))
14: # =====================================================================
icalingam baseline — editable region [READ-ONLY — reference implementation]
Lines 3–14:
1: import numpy as np
2:
3: # =====================================================================
4: # EDITABLE: implement run_causal_discovery below
5: # =====================================================================
6: def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7: """
8: Input: X of shape (n_samples, n_variables)
9: Output: adjacency matrix B of shape (n_variables, n_variables)
10: B[i, j] != 0 means j -> i (follows causal-learn convention)
11: """
12: import numpy as np
13: from scipy.optimize import linear_sum_assignment
14: from sklearn.decomposition import FastICA
15: from sklearn.utils import check_array
16: AssistantStep 1edit
Tool Input
1{
2 "op": "replace",
3 "filename": "causal-learn/bench/custom_algorithm.py",
4 "content": "import numpy as np\n\n# =====================================================================\n# EDITABLE: implement run_causal_discovery below\n# =====================================================================\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n \"\"\"\n Contrastive Residual LiNGAM:\n 1) estimate a causal order by repeatedly extracting the variable whose\n residuals are most contrastively independent in the forward vs reverse\n regression directions;\n 2) estimate sparse direct parents along that order with adaptive-lasso BIC.\n\n Output convention: B[i, j] != 0 means j -> i.\n \"\"\"\n from sklearn.linear_model import LassoLarsIC\n from sklearn.utils import check_array\n\n X = check_array(X, ensure_2d=True, dtype=float)\n n, p = X.shape\n if p == 0:\n return np.zeros((0, 0))\n if p == 1:\n return np.zeros((1, 1))\n\n eps = 1e-8\n Xc = X - X.mean(axis=0, keepdims=True)\n Xs = Xc / (np.sqrt(np.mean(Xc * Xc, axis=0, keepdims=True)) + eps)\n\n def _standardize(A: np.ndarray) -> np.ndarray:\n A = A - A.mean(axis=0, keepdims=True)\n return A / (np.sqrt(np.mean(A * A, axis=0, keepdims=True)) + eps)\n\n def _dep_cols_from_x(x: np.ndarray, R: np.ndarray) -> np.ndarray:\n x = x - x.mean()\n x = x / (np.sqrt(np.mean(x * x)) + eps)\n Rz = _standardize(R)\n x2 = x * x - 1.0\n R2 = Rz * Rz - 1.0\n return (\n np.abs(np.mean(np.tanh(x)[:, None] * Rz, axis=0))\n + np.abs(np.mean(x[:, None] * np.tanh(Rz), axis=0))\n + 0.3 * np.abs(np.mean(x2[:, None] * R2, axis=0))\n )\n\n def _dep_pairs(A: np.ndarray, R: np.ndarray) -> np.ndarray:\n Az = _standardize(A)\n Rz = _standardize(R)\n A2 = Az * Az - 1.0\n R2 = Rz * Rz - 1.0\n return (\n np.abs(np.mean(np.tanh(Az) * Rz, axis=0))\n + np.abs(np.mean(Az * np.tanh(Rz), axis=0))\n + 0.3 * np.abs(np.mean(A2 * R2, axis=0))\n )\n\n # ---------- Stage 1: contrastive exogenous extraction ----------\n remaining = list(range(p))\n order = []\n Z = Xs.copy()\n while len(remaining) > 1:\n m = Z.shape[1]\n scores = np.empty(m)\n col_var = np.sum(Z * Z, axis=0) + eps\n for j in range(m):\n x = Z[:, j]\n gx = Z.T @ x\n\n # Forward: regress every variable on candidate x_j.\n beta_f = gx / (x @ x + eps)\n Rf = Z - np.outer(x, beta_f)\n dep_f = _dep_cols_from_x(x, Rf)\n\n # Reverse: regress x_j on every other variable separately.\n beta_r = gx / col_var\n Rr = x[:, None] - Z * beta_r[None, :]\n dep_r = _dep_pairs(Z, Rr)\n\n corr = np.abs(gx) / np.sqrt((x @ x + eps) * col_var)\n mask = np.arange(m) != j\n # Novel contrastive score: prefer variables that are independent\n # in the forward direction but dependent in the reverse direction.\n scores[j] = np.mean((corr[mask] + 0.05) * np.log((dep_f[mask] + 1e-4) / (dep_r[mask] + 1e-4)))\n\n j_star = int(np.argmin(scores))\n order.append(remaining[j_star])\n\n x = Z[:, j_star]\n beta = (Z.T @ x) / (x @ x + eps)\n Z = Z - np.outer(x, beta)\n Z = np.delete(Z, j_star, axis=1)\n remaining.pop(j_star)\n if Z.size:\n Z = Z - Z.mean(axis=0, keepdims=True)\n order.extend(remaining)\n\n # ---------- Stage 2: sparse parent estimation along the order ----------\n B = np.zeros((p, p), dtype=float)\n for pos, tgt in enumerate(order):\n parents = order[:pos]\n if not parents:\n continue\n y = Xs[:, tgt]\n Xp = Xs[:, parents]\n\n if Xp.shape[1] == 1:\n coef = np.array([(Xp[:, 0] @ y) / (Xp[:, 0] @ Xp[:, 0] + eps)])\n if abs(coef[0]) > 0.05:\n B[tgt, parents[0]] = coef[0]\n continue\n\n ols = np.linalg.lstsq(Xp, y, rcond=None)[0]\n weights = 1.0 / (np.abs(ols) + 0.02)\n Xw = Xp / weights[None, :]\n try:\n model = LassoLarsIC(criterion=\"bic\")\n model.fit(Xw, y)\n coef = model.coef_ / weights\n except Exception:\n coef = ols\n\n sel = np.flatnonzero(np.abs(coef) > 1e-8)\n if sel.size == 0:\n continue\n\n beta = np.linalg.lstsq(Xp[:, sel], y, rcond=None)[0]\n for k, s in enumerate(sel):\n if abs(beta[k]) > 0.04:\n B[tgt, parents[s]] = beta[k]\n\n np.fill_diagonal(B, 0.0)\n return B\n# =====================================================================",
5 "after_line": 14,
6 "start_line": 3,
7 "end_line": 14
8}Tool ResultStep 1
OK: Replaced lines 3..14 with 133 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–135.
[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–135 | total: 135 lines]
1| import numpy as np
2|
3 import numpy as np
4
5 # =====================================================================
6 # EDITABLE: implement run_causal_discovery below
7 # =====================================================================
8 def run_causal_discovery(X: np.ndarray) -> np.ndarray:
9 """
10 Contrastive Residual LiNGAM:
11 1) estimate a causal order by repeatedly extracting the variable whose
12 residuals are most contrastively independent in the forward vs reverse
13 regression directions;
14 2) estimate sparse direct parents along that order with adaptive-lasso BIC.
15
16 Output convention: B[i, j] != 0 means j -> i.
17 """
18
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)
### ER10 (scripts/eval_er10.sh)
Results (ER10):
SHD=2 F1=0.8966 Precision=0.8667 Recall=0.9286
### ER15 (scripts/eval_er15.sh)
Results (ER15):
SHD=2 F1=0.9474 Precision=0.9000 Recall=1.0000
### SF12 (scripts/eval_sf12.sh)
Results (SF12):
SHD=3 F1=0.9302 Precision=0.8696 Recall=1.0000
### ER30 (scripts/eval_er30.sh)
Results (ER30):
SHD=19 F1=0.8995 Precision=0.9126 Recall=0.8868
### ER50 (scripts/eval_er50.sh)
Results (ER50):
SHD=92 F1=0.8025 Precision=0.8298 Recall=0.7769
### ER50-LowSample (scripts/eval_er50_lowsample.sh)
Results (ER50-LowSample):
SHD=58 F1=0.7735 Precision=0.6852 Recall=0.8880
### ER20-Dense (scripts/eval_er20_dense.sh)
Results (ER20-Dense):
SHD=19 F1=0.8950 Precision=0.9643 Recall=0.8351
[Leaderboard] Results saved: {'shd_ER10': 2, 'f1_ER10': 0.8966, 'precision_ER10': 0.8667, 'recall_ER10': 0.9286, 'elapsed_ER10': 3, 'shd_ER15': 2, 'f1_ER15': 0.9474, 'pre
AssistantStep 3edit
Tool Input
1{
2 "op": "replace",
3 "filename": "causal-learn/bench/custom_algorithm.py",
4 "content": "# =====================================================================\n# EDITABLE: implement run_causal_discovery below\n# =====================================================================\ndef run_causal_discovery(X: np.ndarray) -> np.ndarray:\n \"\"\"\n Stable Contrastive Direct LiNGAM.\n\n Novel contribution:\n - build several LiNGAM graphs from complementary inductive biases:\n (i) DirectLiNGAM support if available,\n (ii) full-order adaptive-lasso/backward-BIC refinement,\n (iii) a contrastive forward-vs-reverse residual ordering,\n (iv) a fused support graph combining direct and contrastive candidates;\n - select the final DAG by a residual-independence + sparsity criterion.\n\n Output convention: B[i, j] != 0 means j -> i.\n \"\"\"\n from sklearn.linear_model import LassoLarsIC\n from sklearn.utils import check_array\n\n X = check_array(X, ensure_2d=True, dtype=float)\n n, p = X.shape\n if p == 0:\n return np.zeros((0, 0))\n if p == 1:\n return np.zeros((1, 1))\n\n eps = 1e-8\n Xc = X - X.mean(axis=0, keepdims=True)\n Xs = Xc / (np.sqrt(np.mean(Xc * Xc, axis=0, keepdims=True)) + eps)\n\n def _standardize(A: np.ndarray) -> np.ndarray:\n A = A - A.mean(axis=0, keepdims=True)\n return A / (np.sqrt(np.mean(A * A, axis=0, keepdims=True)) + eps)\n\n def _dep_cols_from_x(x: np.ndarray, R: np.ndarray) -> np.ndarray:\n x = x - x.mean()\n x = x / (np.sqrt(np.mean(x * x)) + eps)\n Rz = _standardize(R)\n x2 = x * x - 1.0\n R2 = Rz * Rz - 1.0\n return (\n np.abs(np.mean(np.tanh(x)[:, None] * Rz, axis=0))\n + np.abs(np.mean(x[:, None] * np.tanh(Rz), axis=0))\n + 0.3 * np.abs(np.mean(x2[:, None] * R2, axis=0))\n )\n\n def _dep_pairs(A: np.ndarray, R: np.ndarray) -> np.ndarray:\n Az = _standardize(A)\n Rz = _standardize(R)\n A2 = Az * Az - 1.0\n R2 = Rz * Rz - 1.0\n return (\n np.abs(np.mean(np.tanh(Az) * Rz, axis=0))\n + np.abs(np.mean(Az * np.tanh(Rz), axis=0))\n + 0.3 * np.abs(np.mean(A2 * R2, axis=0))\n )\n\n def _residual_dep_score(E: np.ndarray) -> float:\n Ez = _standardize(E)\n T = np.tanh(Ez)\n E2 = Ez * Ez - 1.0\n M = np.abs((T.T @ Ez) / max(Ez.shape[0], 1))\n Q = np.abs((E2.T @ E2) / max(Ez.shape[0], 1))\n S = M + M.T + 0.2 * Q\n np.fill_diagonal(S, 0.0)\n return float(np.sum(S) / (p * max(p - 1, 1)))\n\n def _contrastive_order(Z0: np.ndarray) -> list:\n remaining = list(range(p))\n order = []\n Z = Z0.copy()\n while len(remaining) > 1:\n m = Z.shape[1]\n scores = np.empty(m)\n col_var = np.sum(Z * Z, axis=0) + eps\n for j in range(m):\n x = Z[:, j]\n gx = Z.T @ x\n\n beta_f = gx / (x @ x + eps)\n Rf = Z - np.outer(x, beta_f)\n dep_f = _dep_cols_from_x(x, Rf)\n\n beta_r = gx / col_var\n Rr = x[:, None] - Z * beta_r[None, :]\n dep_r = _dep_pairs(Z, Rr)\n\n corr = np.abs(gx) / np.sqrt((x @ x + eps) * col_var)\n mask = np.arange(m) != j\n scores[j] = np.mean(\n (corr[mask] + 0.05)\n * np.log((dep_f[mask] + 1e-4) / (dep_r[mask] + 1e-4))\n )\n\n j_star = int(np.argmin(scores))\n order.append(remaining[j_star])\n\n x = Z[:, j_star]\n beta = (Z.T @ x) / (x @ x + eps)\n Z = Z - np.outer(x, beta)\n Z = np.delete(Z, j_star, axis=1)\n remaining.pop(j_star)\n if Z.size:\n Z = Z - Z.mean(axis=0, keepdims=True)\n order.extend(remaining)\n return order\n\n def _topological_from_support(support: np.ndarray) -> list:\n parents = [set(np.flatnonzero(support[i])) for i in range(p)]\n children = [set(np.flatnonzero(support[:, j])) for j in range(p)]\n roots = [i for i in range(p) if not parents[i]]\n order = []\n used = np.zeros(p, dtype=bool)\n while roots:\n r = roots.pop(0)\n if used[r]:\n continue\n used[r] = True\n order.append(r)\n for c in list(children[r]):\n parents[c].discard(r)\n if not parents[c]:\n roots.append(c)\n children[r].clear()\n if len(order) < p:\n missing = [i for i in range(p) if not used[i]]\n score = np.sum(np.abs(support), axis=1) - np.sum(np.abs(support), axis=0)\n order.extend(sorted(missing, key=lambda i: score[i]))\n return order\n\n def _bic_subset(y: np.ndarray, Xp: np.ndarray) -> np.ndarray:\n m = Xp.shape[1]\n coef = np.zeros(m, dtype=float)\n if m == 0:\n return coef\n\n y_var = np.mean(y * y) + eps\n if m == 1:\n b = float((Xp[:, 0] @ y) / (Xp[:, 0] @ Xp[:, 0] + eps))\n rss1 = np.mean((y - Xp[:, 0] * b) ** 2) + eps\n bic1 = n * np.log(rss1) + np.log(max(n, 2))\n bic0 = n * np.log(y_var)\n if bic1 < bic0:\n coef[0] = b\n return coef\n\n ols = np.linalg.lstsq(Xp, y, rcond=None)[0]\n weights = 1.0 / (np.abs(ols) + 0.02)\n try:\n model = LassoLarsIC(criterion=\"bic\")\n model.fit(Xp / weights[None, :], y)\n warm = model.coef_ / weights\n active = np.flatnonzero(np.abs(warm) > 1e-8).tolist()\n except Exception:\n active = np.flatnonzero(np.abs(ols) > 0.03).tolist()\n\n if not active:\n return coef\n\n while True:\n Xcur = Xp[:, active]\n beta = np.linalg.lstsq(Xcur, y, rcond=None)[0]\n rss_cur = np.mean((y - Xcur @ beta) ** 2) + eps\n bic_cur = n * np.log(rss_cur) + len(active) * np.log(max(n, 2))\n\n best_active = active\n best_bic = bic_cur\n for t in range(len(active)):\n cand = active[:t] + active[t + 1 :]\n if cand:\n Xcand = Xp[:, cand]\n beta_c = np.linalg.lstsq(Xcand, y, rcond=None)[0]\n rss_c = np.mean((y - Xcand @ beta_c) ** 2) + eps\n bic_c = n * np.log(rss_c) + len(cand) * np.log(max(n, 2))\n else:\n bic_c = n * np.log(y_var)\n if bic_c + 1e-8 < best_bic:\n best_bic = bic_c\n best_active = cand\n\n if len(best_active) == len(active) and all(a == b for a, b in zip(best_active, active)):\n break\n active = best_active\n if not active:\n return coef\n\n beta = np.linalg.lstsq(Xp[:, active], y, rcond=None)[0]\n coef[np.array(active)] = beta\n coef[np.abs(coef) < 1e-8] = 0.0\n return coef\n\n def _ols_support(y: np.ndarray, Xp: np.ndarray) -> np.ndarray:\n if Xp.shape[1] == 0:\n return np.zeros(0, dtype=float)\n beta = np.linalg.lstsq(Xp, y, rcond=None)[0]\n beta[np.abs(beta) < 1e-8] = 0.0\n return beta\n\n def _estimate_from_order(order: list, support: np.ndarray = None, refine: bool = True) -> np.ndarray:\n B = np.zeros((p, p), dtype=float)\n for pos, tgt in enumerate(order):\n prev = order[:pos]\n if support is None:\n parents = prev\n else:\n parents = [j for j in prev if support[tgt, j]]\n if not parents:\n continue\n Xp = Xs[:, parents]\n y = Xs[:, tgt]\n beta = _bic_subset(y, Xp) if refine else _ols_support(y, Xp)\n if beta.size:\n B[tgt, parents] = beta\n np.fill_diagonal(B, 0.0)\n return B\n\n def _graph_score(B: np.ndarray) -> float:\n E = Xs - Xs @ B.T\n rss = np.mean(E * E, axis=0) + eps\n dep = _residual_dep_score(E)\n l0 = np.count_nonzero(np.abs(B) > 1e-10)\n bic_like = np.mean(np.log(rss)) + 0.5 * l0 * np.log(max(n, 2)) / max(n * p, 1)\n return float(dep + 0.15 * bic_like)\n\n def _fit_direct() -> tuple:\n DirectLiNGAM = None\n try:\n from causallearn.search.FCMBased.lingam import DirectLiNGAM as _DL\n\n DirectLiNGAM = _DL\n except Exception:\n try:\n from causallearn.search.FCMBased.lingam.direct_lingam import DirectLiNGAM as _DL\n\n DirectLiNGAM = _DL\n except Exception:\n return None, None\n\n try:\n model = DirectLiNGAM()\n model.fit(X)\n B_raw = getattr(model, \"adjacency_matrix_\", None)\n if B_raw is None:\n B_raw = getattr(model, \"_adjacency_matrix\", None)\n if B_raw is None:\n return None, None\n B_raw = np.asarray(B_raw, dtype=float)\n support = np.abs(B_raw) > 1e-10\n order = getattr(model, \"causal_order_\", None)\n if order is None or len(order) != p:\n order = _topological_from_support(support)\n else:\n order = [int(v) for v in order]\n return support, order\n except Exception:\n return None, None\n\n candidates = []\n\n support_direct, order_direct = _fit_direct()\n if support_direct is not None and order_direct is not None:\n candidates.append(_estimate_from_order(order_direct, support_direct, refine=False))\n candidates.append(_estimate_from_order(order_direct, support_direct, refine=True))\n candidates.append(_estimate_from_order(order_direct, support=None, refine=True))\n\n order_contrast = _contrastive_order(Xs)\n B_contrast = _estimate_from_order(order_contrast, support=None, refine=True)\n candidates.append(B_contrast)\n\n if support_direct is not None and order_direct is not None:\n \n... (truncated)",
5 "after_line": 2,
6 "start_line": 3,
7 "end_line": 135
8}Tool ResultStep 3
OK: Replaced lines 3..135 with 294 line(s) in causal-learn/bench/custom_algorithm.py. Editable range: 3–296.
[Current file: causal-learn/bench/custom_algorithm.py | editable: 3–296 | total: 296 lines]
1| import numpy as np
2|
3 # =====================================================================
4 # EDITABLE: implement run_causal_discovery below
5 # =====================================================================
6 def run_causal_discovery(X: np.ndarray) -> np.ndarray:
7 """
8 Stable Contrastive Direct LiNGAM.
9
10 Novel contribution:
11 - build several LiNGAM graphs from complementary inductive biases:
12 (i) DirectLiNGAM support if available,
13 (ii) full-order adaptive-lasso/backward-BIC refinement,
14 (iii) a contrastive forward-vs-reverse residual ordering,
15 (iv) a fused support graph combining direct and contrastive candidates;
16 - selec
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=2) to submit this result)
### ER10 (scripts/eval_er10.sh)
Results (ER10):
SHD=0 F1=1.0000 Precision=1.0000 Recall=1.0000
### ER15 (scripts/eval_er15.sh)
Results (ER15):
SHD=2 F1=0.9474 Precision=0.9000 Recall=1.0000
### SF12 (scripts/eval_sf12.sh)
Results (SF12):
SHD=1 F1=0.9756 Precision=0.9524 Recall=1.0000
### ER30 (scripts/eval_er30.sh)
Results (ER30):
SHD=1 F1=0.9953 Precision=0.9907 Recall=1.0000
### ER50 (scripts/eval_er50.sh)
Results (ER50):
SHD=21 F1=0.9569 Precision=0.9873 Recall=0.9283
### ER50-LowSample (scripts/eval_er50_lowsample.sh)
Results (ER50-LowSample):
SHD=30 F1=0.8832 Precision=0.8121 Recall=0.9680
### ER20-Dense (scripts/eval_er20_dense.sh)
Results (ER20-Dense):
SHD=7 F1=0.9641 Precision=0.9592 Recall=0.9691
[Leaderboard] Results saved: {'shd_ER10': 0, 'f1_ER10': 1.0, 'precision_ER10': 1.0, 'recall_ER10': 1.0, 'elapsed_ER10': 3, 'shd_ER15': 2, 'f1_ER15': 0.9474, 'precision_ER15'
AssistantStep 5submit
Tool Input
1{
2 "n": 2
3}Tool ResultStep 5
[submit] Submitting result from test #2 as final.
### ER10 (scripts/eval_er10.sh)
Results (ER10):
SHD=0 F1=1.0000 Precision=1.0000 Recall=1.0000
### ER15 (scripts/eval_er15.sh)
Results (ER15):
SHD=2 F1=0.9474 Precision=0.9000 Recall=1.0000
### SF12 (scripts/eval_sf12.sh)
Results (SF12):
SHD=1 F1=0.9756 Precision=0.9524 Recall=1.0000
### ER30 (scripts/eval_er30.sh)
Results (ER30):
SHD=1 F1=0.9953 Precision=0.9907 Recall=1.0000
### ER50 (scripts/eval_er50.sh)
Results (ER50):
SHD=21 F1=0.9569 Precision=0.9873 Recall=0.9283
### ER50-LowSample (scripts/eval_er50_lowsample.sh)
Results (ER50-LowSample):
SHD=30 F1=0.8832 Precision=0.8121 Recall=0.9680
### SF100 (scripts/eval_sf100.sh)
Results (SF100):
SHD=17 F1=0.9715 Precision=0.9477 Recall=0.9966
### ER20-Dense (scripts/eval_er20_dense.sh)
Results (ER20-Dense):
SHD=7 F1=0.9641 Precision=0.9592 Recall=0.9691
[Leaderboard] Results saved: {'shd_ER10': 0, 'f1_ER10': 1.0, 'precision_ER10': 1.0, 're