Agent Conversation: openai/gpt-5.4
Task: ml-subgroup-calibration-shift
Subgroup Calibration Under Distribution Shift
Research Question
Design a post-hoc calibration method that remains reliable when subgroup composition shifts between calibration and test time.
Background
Many calibration methods look good on average but fail on protected or operational subgroups once the test distribution shifts. This task isolates that failure mode. The fixed pipeline trains a tabular classifier, then applies a user-defined calibration mapping on held-out calibration data before evaluation on shifted test data.
Classical baselines include:
- Temperature scaling: one global temperature for all samples
- Isotonic regression: non-parametric monotone calibration
- Beta calibration: a richer parametric mapping on probabilities
- Group-wise temperature scaling: separate temperatures per subgroup
Task
Modify the CalibrationMethod class in custom_subgroup_calibration.py. The fixed code loads data, creates a shifted split, trains the base classifier, and computes metrics. Your method only controls the post-hoc calibration mapping.
class CalibrationMethod:
def fit(self, probs, labels, groups=None):
...
def predict_proba(self, probs, groups=None):
...
Inputs are positive-class probabilities from the base classifier. groups contains subgroup IDs when available and may be ignored by group-agnostic methods.
Evaluation
This benchmark uses three lightweight tabular proxies that are already available in the current scikit-learn package setup. We would normally prefer Adult, ACSIncome, COMPAS, and Law School Admissions, but those require package-level data changes that are outside this task directory. To keep the benchmark runnable offline, we use cached scikit-learn datasets with similar calibration and subgroup-shift behavior:
- breast_cancer: binary classification on the scikit-learn breast cancer dataset
- california_housing: binary high-value/low-value decision built from California housing
- diabetes: binary high-risk/low-risk decision built from the diabetes target
For each dataset, the split is intentionally shifted:
- a domain score determines the held-out test tail
- subgroup labels are quartiles of a separate proxy feature
- calibration is fit on the source region and evaluated on the shifted region
Metrics
Lower is better for:
- worst_group_ece
- brier
- max_subgroup_gap
Higher is better for:
- subgroup_auroc
Notes
- The task is deliberately low compute and should run with a small tabular classifier.
- If you need the exact Adult/ACSIncome/COMPAS/Law School datasets, they should be added through a package-level data change, not inside this task directory.
scikit-learn/custom_subgroup_calibration.py [EDITABLE — lines 72–91 only]
1: """Subgroup calibration under distribution shift.
2:
3: The benchmark is intentionally offline and low compute. It usesTool Input
1{
2 "op": "replace",
3 "filename": "scikit-lelearn/custom_subgroup_calibration.py",
4 "content": "class CalibrationMethod:\n \"\"\"Balanced hierarchical subgroup logit calibration with robust shrinkage.\"\"\"\n\n def __init__(self):\n self.eps = 1e-6\n self.global_a_, self.global_b_ = 1.0, 0.0\n self.group_a_, self.group_b_, self.group_mix_ = {}, {}, {}\n\n def fit(self, probs, labels, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n y = np.asarray(labels).reshape(-1).astype(int)\n g = np.zeros_like(y) if groups is None else np.asarray(groups).reshape(-1).astype(int)\n z = special.logit(p)\n uniq, inv = np.unique(g, return_inverse=True)\n cnt = np.bincount(inv).astype(float)\n shrink = np.sqrt(np.median(cnt) / np.maximum(cnt, 1.0))\n tau_mix, G = 20.0, len(uniq)\n\n def unpack(theta):\n raw_da = theta[2 : 2 + G]\n raw_db = theta[2 + G : 2 + 2 * G]\n da = raw_da - np.average(raw_da, weights=cnt)\n db = raw_db - np.average(raw_db, weights=cnt)\n ga = np.exp(np.clip(theta[0] + da, -2.5, 2.5))\n gb = np.clip(theta[1] + db, -4.0, 4.0)\n base = special.expit(np.exp(np.clip(theta[0], -2.5, 2.5)) * z + np.clip(theta[1], -4.0, 4.0))\n grp = special.expit(ga[inv] * z + gb[inv])\n mix = cnt[inv] / (cnt[inv] + tau_mix)\n pred = np.clip((1.0 - mix) * base + mix * grp, self.eps, 1.0 - self.eps)\n return pred, da, db, ga, gb\n\n def objective(theta):\n pred, da, db, _, _ = unpack(theta)\n sq = (pred - y) ** 2\n gbrier = np.array([sq[inv == i].mean() for i in range(G)])\n ggap = np.array([abs(pred[inv == i].mean() - y[inv == i].mean()) for i in range(G)])\n robust = special.logsumexp(18.0 * gbrier) / 18.0 + 0.7 * special.logsumexp(28.0 * ggap) / 28.0\n balanced = gbrier.mean() + 0.5 * ggap.mean()\n reg = 0.01 * (theta[0] ** 2 + 0.25 * theta[1] ** 2) + 0.05 * np.mean(shrink * (da**2 + 0.25 * db**2))\n return float(sq.mean() + 0.7 * balanced + 1.1 * robust + reg)\n\n x0 = np.zeros(2 + 2 * G)\n bnds = [(-2.5, 2.5), (-4.0, 4.0)] + [(-1.5, 1.5)] * G + [(-2.5, 2.5)] * G\n res = optimize.minimize(objective, x0=x0, method=\"L-BFGS-B\", bounds=bnds)\n theta = res.x if res.success else x0\n _, da, db, ga, gb = unpack(theta)\n self.global_a_ = float(np.exp(np.clip(theta[0], -2.5, 2.5)))\n self.global_b_ = float(np.clip(theta[1], -4.0, 4.0))\n self.group_a_ = {int(k): float(ga[i]) for i, k in enumerate(uniq)}\n self.group_b_ = {int(k): float(gb[i]) for i, k in enumerate(uniq)}\n self.group_mix_ = {int(k): float(cnt[i] / (cnt[i] + tau_mix)) for i, k in enumerate(uniq)}\n return self\n\n def predict_proba(self, probs, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n z = special.logit(p)\n base = special.expit(self.global_a_ * z + self.global_b_)\n if groups is None:\n return np.clip(base, self.eps, 1.0 - self.eps)\n g = np.asarray(groups).reshape(-1).astype(int)\n out = np.array(base, copy=True)\n for k in np.unique(g):\n m = g == k\n a = self.group_a_.get(int(k), self.global_a_)\n b = self.group_b_.get(int(k), self.global_b_)\n w = self.group_mix_.get(int(k), 0.0)\n out[m] = (1.0 - w) * base[m] + w * special.expit(a * z[m] + b)\n return np.clip(out, self.eps, 1.0 - self.eps)",
5 "after_line": 91,
6 "start_line": 72,
7 "end_line": 91
8}ERROR: Package 'scikit-lelearn' is not in allowed packages
Tool Input
1{
2 "op": "replace",
3 "filename": "custom_subgroup_calibration.py",
4 "content": "class CalibrationMethod:\n \"\"\"Balanced hierarchical subgroup logit calibration with robust shrinkage.\"\"\"\n\n def __init__(self):\n self.eps = 1e-6\n self.global_a_, self.global_b_ = 1.0, 0.0\n self.group_a_, self.group_b_, self.group_mix_ = {}, {}, {}\n\n def fit(self, probs, labels, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n y = np.asarray(labels).reshape(-1).astype(int)\n g = np.zeros_like(y) if groups is None else np.asarray(groups).reshape(-1).astype(int)\n z = special.logit(p)\n uniq, inv = np.unique(g, return_inverse=True)\n cnt = np.bincount(inv).astype(float)\n shrink = np.sqrt(np.median(cnt) / np.maximum(cnt, 1.0))\n tau_mix, G = 20.0, len(uniq)\n\n def unpack(theta):\n raw_da = theta[2 : 2 + G]\n raw_db = theta[2 + G : 2 + 2 * G]\n da = raw_da - np.average(raw_da, weights=cnt)\n db = raw_db - np.average(raw_db, weights=cnt)\n ga = np.exp(np.clip(theta[0] + da, -2.5, 2.5))\n gb = np.clip(theta[1] + db, -4.0, 4.0)\n base = special.expit(np.exp(np.clip(theta[0], -2.5, 2.5)) * z + np.clip(theta[1], -4.0, 4.0))\n grp = special.expit(ga[inv] * z + gb[inv])\n mix = cnt[inv] / (cnt[inv] + tau_mix)\n pred = np.clip((1.0 - mix) * base + mix * grp, self.eps, 1.0 - self.eps)\n return pred, da, db, ga, gb\n\n def objective(theta):\n pred, da, db, _, _ = unpack(theta)\n sq = (pred - y) ** 2\n gbrier = np.array([sq[inv == i].mean() for i in range(G)])\n ggap = np.array([abs(pred[inv == i].mean() - y[inv == i].mean()) for i in range(G)])\n robust = special.logsumexp(18.0 * gbrier) / 18.0 + 0.7 * special.logsumexp(28.0 * ggap) / 28.0\n balanced = gbrier.mean() + 0.5 * ggap.mean()\n reg = 0.01 * (theta[0] ** 2 + 0.25 * theta[1] ** 2) + 0.05 * np.mean(shrink * (da**2 + 0.25 * db**2))\n return float(sq.mean() + 0.7 * balanced + 1.1 * robust + reg)\n\n x0 = np.zeros(2 + 2 * G)\n bnds = [(-2.5, 2.5), (-4.0, 4.0)] + [(-1.5, 1.5)] * G + [(-2.5, 2.5)] * G\n res = optimize.minimize(objective, x0=x0, method=\"L-BFGS-B\", bounds=bnds)\n theta = res.x if res.success else x0\n _, _, _, ga, gb = unpack(theta)\n self.global_a_ = float(np.exp(np.clip(theta[0], -2.5, 2.5)))\n self.global_b_ = float(np.clip(theta[1], -4.0, 4.0))\n self.group_a_ = {int(k): float(ga[i]) for i, k in enumerate(uniq)}\n self.group_b_ = {int(k): float(gb[i]) for i, k in enumerate(uniq)}\n self.group_mix_ = {int(k): float(cnt[i] / (cnt[i] + tau_mix)) for i, k in enumerate(uniq)}\n return self\n\n def predict_proba(self, probs, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n z = special.logit(p)\n base = special.expit(self.global_a_ * z + self.global_b_)\n if groups is None:\n return np.clip(base, self.eps, 1.0 - self.eps)\n g = np.asarray(groups).reshape(-1).astype(int)\n out = np.array(base, copy=True)\n for k in np.unique(g):\n m = g == k\n a = self.group_a_.get(int(k), self.global_a_)\n b = self.group_b_.get(int(k), self.global_b_)\n w = self.group_mix_.get(int(k), 0.0)\n out[m] = (1.0 - w) * base[m] + w * special.expit(a * z[m] + b)\n return np.clip(out, self.eps, 1.0 - self.eps)",
5 "after_line": 91,
6 "start_line": 72,
7 "end_line": 91
8}ERROR: Package 'custom_subgroup_calibration.py' is not in allowed packages
Tool Input
1{
2 "op": "replace",
3 "filename": "scikit_learn/custom_subgroup_calibration.py",
4 "content": "class CalibrationMethod:\n \"\"\"Balanced hierarchical subgroup logit calibration with robust shrinkage.\"\"\"\n\n def __init__(self):\n self.eps = 1e-6\n self.global_a_, self.global_b_ = 1.0, 0.0\n self.group_a_, self.group_b_, self.group_mix_ = {}, {}, {}\n\n def fit(self, probs, labels, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n y = np.asarray(labels).reshape(-1).astype(int)\n g = np.zeros_like(y) if groups is None else np.asarray(groups).reshape(-1).astype(int)\n z = special.logit(p)\n uniq, inv = np.unique(g, return_inverse=True)\n cnt = np.bincount(inv).astype(float)\n shrink = np.sqrt(np.median(cnt) / np.maximum(cnt, 1.0))\n tau_mix, G = 20.0, len(uniq)\n\n def unpack(theta):\n raw_da = theta[2 : 2 + G]\n raw_db = theta[2 + G : 2 + 2 * G]\n da = raw_da - np.average(raw_da, weights=cnt)\n db = raw_db - np.average(raw_db, weights=cnt)\n ga = np.exp(np.clip(theta[0] + da, -2.5, 2.5))\n gb = np.clip(theta[1] + db, -4.0, 4.0)\n base = special.expit(np.exp(np.clip(theta[0], -2.5, 2.5)) * z + np.clip(theta[1], -4.0, 4.0))\n grp = special.expit(ga[inv] * z + gb[inv])\n mix = cnt[inv] / (cnt[inv] + tau_mix)\n pred = np.clip((1.0 - mix) * base + mix * grp, self.eps, 1.0 - self.eps)\n return pred, da, db, ga, gb\n\n def objective(theta):\n pred, da, db, _, _ = unpack(theta)\n sq = (pred - y) ** 2\n gbrier = np.array([sq[inv == i].mean() for i in range(G)])\n ggap = np.array([abs(pred[inv == i].mean() - y[inv == i].mean()) for i in range(G)])\n robust = special.logsumexp(18.0 * gbrier) / 18.0 + 0.7 * special.logsumexp(28.0 * ggap) / 28.0\n balanced = gbrier.mean() + 0.5 * ggap.mean()\n reg = 0.01 * (theta[0] ** 2 + 0.25 * theta[1] ** 2) + 0.05 * np.mean(shrink * (da**2 + 0.25 * db**2))\n return float(sq.mean() + 0.7 * balanced + 1.1 * robust + reg)\n\n x0 = np.zeros(2 + 2 * G)\n bnds = [(-2.5, 2.5), (-4.0, 4.0)] + [(-1.5, 1.5)] * G + [(-2.5, 2.5)] * G\n res = optimize.minimize(objective, x0=x0, method=\"L-BFGS-B\", bounds=bnds)\n theta = res.x if res.success else x0\n _, _, _, ga, gb = unpack(theta)\n self.global_a_ = float(np.exp(np.clip(theta[0], -2.5, 2.5)))\n self.global_b_ = float(np.clip(theta[1], -4.0, 4.0))\n self.group_a_ = {int(k): float(ga[i]) for i, k in enumerate(uniq)}\n self.group_b_ = {int(k): float(gb[i]) for i, k in enumerate(uniq)}\n self.group_mix_ = {int(k): float(cnt[i] / (cnt[i] + tau_mix)) for i, k in enumerate(uniq)}\n return self\n\n def predict_proba(self, probs, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n z = special.logit(p)\n base = special.expit(self.global_a_ * z + self.global_b_)\n if groups is None:\n return np.clip(base, self.eps, 1.0 - self.eps)\n g = np.asarray(groups).reshape(-1).astype(int)\n out = np.array(base, copy=True)\n for k in np.unique(g):\n m = g == k\n a = self.group_a_.get(int(k), self.global_a_)\n b = self.group_b_.get(int(k), self.global_b_)\n w = self.group_mix_.get(int(k), 0.0)\n out[m] = (1.0 - w) * base[m] + w * special.expit(a * z[m] + b)\n return np.clip(out, self.eps, 1.0 - self.eps)",
5 "after_line": 91,
6 "start_line": 72,
7 "end_line": 91
8}ERROR: File not editable: scikit_learn/custom_subgroup_calibration.py
Tool Input
1{
2 "op": "replace",
3 "filename": "sklearn/custom_subgroup_calibration.py",
4 "content": "class CalibrationMethod:\n \"\"\"Balanced hierarchical subgroup logit calibration with robust shrinkage.\"\"\"\n\n def __init__(self):\n self.eps = 1e-6\n self.global_a_, self.global_b_ = 1.0, 0.0\n self.group_a_, self.group_b_, self.group_mix_ = {}, {}, {}\n\n def fit(self, probs, labels, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n y = np.asarray(labels).reshape(-1).astype(int)\n g = np.zeros_like(y) if groups is None else np.asarray(groups).reshape(-1).astype(int)\n z = special.logit(p)\n uniq, inv = np.unique(g, return_inverse=True)\n cnt = np.bincount(inv).astype(float)\n shrink = np.sqrt(np.median(cnt) / np.maximum(cnt, 1.0))\n tau_mix, G = 20.0, len(uniq)\n\n def unpack(theta):\n raw_da = theta[2 : 2 + G]\n raw_db = theta[2 + G : 2 + 2 * G]\n da = raw_da - np.average(raw_da, weights=cnt)\n db = raw_db - np.average(raw_db, weights=cnt)\n ga = np.exp(np.clip(theta[0] + da, -2.5, 2.5))\n gb = np.clip(theta[1] + db, -4.0, 4.0)\n base = special.expit(np.exp(np.clip(theta[0], -2.5, 2.5)) * z + np.clip(theta[1], -4.0, 4.0))\n grp = special.expit(ga[inv] * z + gb[inv])\n mix = cnt[inv] / (cnt[inv] + tau_mix)\n pred = np.clip((1.0 - mix) * base + mix * grp, self.eps, 1.0 - self.eps)\n return pred, da, db, ga, gb\n\n def objective(theta):\n pred, da, db, _, _ = unpack(theta)\n sq = (pred - y) ** 2\n gbrier = np.array([sq[inv == i].mean() for i in range(G)])\n ggap = np.array([abs(pred[inv == i].mean() - y[inv == i].mean()) for i in range(G)])\n robust = special.logsumexp(18.0 * gbrier) / 18.0 + 0.7 * special.logsumexp(28.0 * ggap) / 28.0\n balanced = gbrier.mean() + 0.5 * ggap.mean()\n reg = 0.01 * (theta[0] ** 2 + 0.25 * theta[1] ** 2) + 0.05 * np.mean(shrink * (da**2 + 0.25 * db**2))\n return float(sq.mean() + 0.7 * balanced + 1.1 * robust + reg)\n\n x0 = np.zeros(2 + 2 * G)\n bnds = [(-2.5, 2.5), (-4.0, 4.0)] + [(-1.5, 1.5)] * G + [(-2.5, 2.5)] * G\n res = optimize.minimize(objective, x0=x0, method=\"L-BFGS-B\", bounds=bnds)\n theta = res.x if res.success else x0\n _, _, _, ga, gb = unpack(theta)\n self.global_a_ = float(np.exp(np.clip(theta[0], -2.5, 2.5)))\n self.global_b_ = float(np.clip(theta[1], -4.0, 4.0))\n self.group_a_ = {int(k): float(ga[i]) for i, k in enumerate(uniq)}\n self.group_b_ = {int(k): float(gb[i]) for i, k in enumerate(uniq)}\n self.group_mix_ = {int(k): float(cnt[i] / (cnt[i] + tau_mix)) for i, k in enumerate(uniq)}\n return self\n\n def predict_proba(self, probs, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n z = special.logit(p)\n base = special.expit(self.global_a_ * z + self.global_b_)\n if groups is None:\n return np.clip(base, self.eps, 1.0 - self.eps)\n g = np.asarray(groups).reshape(-1).astype(int)\n out = np.array(base, copy=True)\n for k in np.unique(g):\n m = g == k\n a = self.group_a_.get(int(k), self.global_a_)\n b = self.group_b_.get(int(k), self.global_b_)\n w = self.group_mix_.get(int(k), 0.0)\n out[m] = (1.0 - w) * base[m] + w * special.expit(a * z[m] + b)\n return np.clip(out, self.eps, 1.0 - self.eps)",
5 "after_line": 91,
6 "start_line": 72,
7 "end_line": 91
8}ERROR: Package 'sklearn' is not in allowed packages
Tool Input
1{
2 "op": "replace",
3 "filename": "ml-subgroup-calibration-shift/scikit-learn/custom_subgroup_calibration.py",
4 "content": "class CalibrationMethod:\n \"\"\"Balanced hierarchical subgroup logit calibration with robust shrinkage.\"\"\"\n\n def __init__(self):\n self.eps = 1e-6\n self.global_a_, self.global_b_ = 1.0, 0.0\n self.group_a_, self.group_b_, self.group_mix_ = {}, {}, {}\n\n def fit(self, probs, labels, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n y = np.asarray(labels).reshape(-1).astype(int)\n g = np.zeros_like(y) if groups is None else np.asarray(groups).reshape(-1).astype(int)\n z = special.logit(p)\n uniq, inv = np.unique(g, return_inverse=True)\n cnt = np.bincount(inv).astype(float)\n shrink = np.sqrt(np.median(cnt) / np.maximum(cnt, 1.0))\n tau_mix, G = 20.0, len(uniq)\n\n def unpack(theta):\n raw_da = theta[2 : 2 + G]\n raw_db = theta[2 + G : 2 + 2 * G]\n da = raw_da - np.average(raw_da, weights=cnt)\n db = raw_db - np.average(raw_db, weights=cnt)\n ga = np.exp(np.clip(theta[0] + da, -2.5, 2.5))\n gb = np.clip(theta[1] + db, -4.0, 4.0)\n base = special.expit(np.exp(np.clip(theta[0], -2.5, 2.5)) * z + np.clip(theta[1], -4.0, 4.0))\n grp = special.expit(ga[inv] * z + gb[inv])\n mix = cnt[inv] / (cnt[inv] + tau_mix)\n pred = np.clip((1.0 - mix) * base + mix * grp, self.eps, 1.0 - self.eps)\n return pred, da, db, ga, gb\n\n def objective(theta):\n pred, da, db, _, _ = unpack(theta)\n sq = (pred - y) ** 2\n gbrier = np.array([sq[inv == i].mean() for i in range(G)])\n ggap = np.array([abs(pred[inv == i].mean() - y[inv == i].mean()) for i in range(G)])\n robust = special.logsumexp(18.0 * gbrier) / 18.0 + 0.7 * special.logsumexp(28.0 * ggap) / 28.0\n balanced = gbrier.mean() + 0.5 * ggap.mean()\n reg = 0.01 * (theta[0] ** 2 + 0.25 * theta[1] ** 2) + 0.05 * np.mean(shrink * (da**2 + 0.25 * db**2))\n return float(sq.mean() + 0.7 * balanced + 1.1 * robust + reg)\n\n x0 = np.zeros(2 + 2 * G)\n bnds = [(-2.5, 2.5), (-4.0, 4.0)] + [(-1.5, 1.5)] * G + [(-2.5, 2.5)] * G\n res = optimize.minimize(objective, x0=x0, method=\"L-BFGS-B\", bounds=bnds)\n theta = res.x if res.success else x0\n _, _, _, ga, gb = unpack(theta)\n self.global_a_ = float(np.exp(np.clip(theta[0], -2.5, 2.5)))\n self.global_b_ = float(np.clip(theta[1], -4.0, 4.0))\n self.group_a_ = {int(k): float(ga[i]) for i, k in enumerate(uniq)}\n self.group_b_ = {int(k): float(gb[i]) for i, k in enumerate(uniq)}\n self.group_mix_ = {int(k): float(cnt[i] / (cnt[i] + tau_mix)) for i, k in enumerate(uniq)}\n return self\n\n def predict_proba(self, probs, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n z = special.logit(p)\n base = special.expit(self.global_a_ * z + self.global_b_)\n if groups is None:\n return np.clip(base, self.eps, 1.0 - self.eps)\n g = np.asarray(groups).reshape(-1).astype(int)\n out = np.array(base, copy=True)\n for k in np.unique(g):\n m = g == k\n a = self.group_a_.get(int(k), self.global_a_)\n b = self.group_b_.get(int(k), self.global_b_)\n w = self.group_mix_.get(int(k), 0.0)\n out[m] = (1.0 - w) * base[m] + w * special.expit(a * z[m] + b)\n return np.clip(out, self.eps, 1.0 - self.eps)",
5 "after_line": 91,
6 "start_line": 72,
7 "end_line": 91
8}ERROR: Package 'ml-subgroup-calibration-shift' is not in allowed packages
Tool Input
1{
2 "op": "replace",
3 "filename": "scikit/custom_subgroup_calibration.py",
4 "content": "class CalibrationMethod:\n \"\"\"Balanced hierarchical subgroup logit calibration with robust shrinkage.\"\"\"\n\n def __init__(self):\n self.eps = 1e-6\n self.global_a_, self.global_b_ = 1.0, 0.0\n self.group_a_, self.group_b_, self.group_mix_ = {}, {}, {}\n\n def fit(self, probs, labels, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n y = np.asarray(labels).reshape(-1).astype(int)\n g = np.zeros_like(y) if groups is None else np.asarray(groups).reshape(-1).astype(int)\n z = special.logit(p)\n uniq, inv = np.unique(g, return_inverse=True)\n cnt = np.bincount(inv).astype(float)\n shrink = np.sqrt(np.median(cnt) / np.maximum(cnt, 1.0))\n tau_mix, G = 20.0, len(uniq)\n\n def unpack(theta):\n raw_da = theta[2 : 2 + G]\n raw_db = theta[2 + G : 2 + 2 * G]\n da = raw_da - np.average(raw_da, weights=cnt)\n db = raw_db - np.average(raw_db, weights=cnt)\n ga = np.exp(np.clip(theta[0] + da, -2.5, 2.5))\n gb = np.clip(theta[1] + db, -4.0, 4.0)\n base = special.expit(np.exp(np.clip(theta[0], -2.5, 2.5)) * z + np.clip(theta[1], -4.0, 4.0))\n grp = special.expit(ga[inv] * z + gb[inv])\n mix = cnt[inv] / (cnt[inv] + tau_mix)\n pred = np.clip((1.0 - mix) * base + mix * grp, self.eps, 1.0 - self.eps)\n return pred, da, db, ga, gb\n\n def objective(theta):\n pred, da, db, _, _ = unpack(theta)\n sq = (pred - y) ** 2\n gbrier = np.array([sq[inv == i].mean() for i in range(G)])\n ggap = np.array([abs(pred[inv == i].mean() - y[inv == i].mean()) for i in range(G)])\n robust = special.logsumexp(18.0 * gbrier) / 18.0 + 0.7 * special.logsumexp(28.0 * ggap) / 28.0\n balanced = gbrier.mean() + 0.5 * ggap.mean()\n reg = 0.01 * (theta[0] ** 2 + 0.25 * theta[1] ** 2) + 0.05 * np.mean(shrink * (da**2 + 0.25 * db**2))\n return float(sq.mean() + 0.7 * balanced + 1.1 * robust + reg)\n\n x0 = np.zeros(2 + 2 * G)\n bnds = [(-2.5, 2.5), (-4.0, 4.0)] + [(-1.5, 1.5)] * G + [(-2.5, 2.5)] * G\n res = optimize.minimize(objective, x0=x0, method=\"L-BFGS-B\", bounds=bnds)\n theta = res.x if res.success else x0\n _, _, _, ga, gb = unpack(theta)\n self.global_a_ = float(np.exp(np.clip(theta[0], -2.5, 2.5)))\n self.global_b_ = float(np.clip(theta[1], -4.0, 4.0))\n self.group_a_ = {int(k): float(ga[i]) for i, k in enumerate(uniq)}\n self.group_b_ = {int(k): float(gb[i]) for i, k in enumerate(uniq)}\n self.group_mix_ = {int(k): float(cnt[i] / (cnt[i] + tau_mix)) for i, k in enumerate(uniq)}\n return self\n\n def predict_proba(self, probs, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n z = special.logit(p)\n base = special.expit(self.global_a_ * z + self.global_b_)\n if groups is None:\n return np.clip(base, self.eps, 1.0 - self.eps)\n g = np.asarray(groups).reshape(-1).astype(int)\n out = np.array(base, copy=True)\n for k in np.unique(g):\n m = g == k\n a = self.group_a_.get(int(k), self.global_a_)\n b = self.group_b_.get(int(k), self.global_b_)\n w = self.group_mix_.get(int(k), 0.0)\n out[m] = (1.0 - w) * base[m] + w * special.expit(a * z[m] + b)\n return np.clip(out, self.eps, 1.0 - self.eps)",
5 "after_line": 91,
6 "start_line": 72,
7 "end_line": 91
8}ERROR: Package 'scikit' is not in allowed packages
Tool Input
1{
2 "op": "replace",
3 "filename": "scikitlearn/custom_subgroup_calibration.py",
4 "content": "class CalibrationMethod:\n \"\"\"Balanced hierarchical subgroup logit calibration with robust shrinkage.\"\"\"\n\n def __init__(self):\n self.eps = 1e-6\n self.global_a_, self.global_b_ = 1.0, 0.0\n self.group_a_, self.group_b_, self.group_mix_ = {}, {}, {}\n\n def fit(self, probs, labels, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n y = np.asarray(labels).reshape(-1).astype(int)\n g = np.zeros_like(y) if groups is None else np.asarray(groups).reshape(-1).astype(int)\n z = special.logit(p)\n uniq, inv = np.unique(g, return_inverse=True)\n cnt = np.bincount(inv).astype(float)\n shrink = np.sqrt(np.median(cnt) / np.maximum(cnt, 1.0))\n tau_mix, G = 20.0, len(uniq)\n\n def unpack(theta):\n raw_da = theta[2 : 2 + G]\n raw_db = theta[2 + G : 2 + 2 * G]\n da = raw_da - np.average(raw_da, weights=cnt)\n db = raw_db - np.average(raw_db, weights=cnt)\n ga = np.exp(np.clip(theta[0] + da, -2.5, 2.5))\n gb = np.clip(theta[1] + db, -4.0, 4.0)\n base = special.expit(np.exp(np.clip(theta[0], -2.5, 2.5)) * z + np.clip(theta[1], -4.0, 4.0))\n grp = special.expit(ga[inv] * z + gb[inv])\n mix = cnt[inv] / (cnt[inv] + tau_mix)\n pred = np.clip((1.0 - mix) * base + mix * grp, self.eps, 1.0 - self.eps)\n return pred, da, db, ga, gb\n\n def objective(theta):\n pred, da, db, _, _ = unpack(theta)\n sq = (pred - y) ** 2\n gbrier = np.array([sq[inv == i].mean() for i in range(G)])\n ggap = np.array([abs(pred[inv == i].mean() - y[inv == i].mean()) for i in range(G)])\n robust = special.logsumexp(18.0 * gbrier) / 18.0 + 0.7 * special.logsumexp(28.0 * ggap) / 28.0\n balanced = gbrier.mean() + 0.5 * ggap.mean()\n reg = 0.01 * (theta[0] ** 2 + 0.25 * theta[1] ** 2) + 0.05 * np.mean(shrink * (da**2 + 0.25 * db**2))\n return float(sq.mean() + 0.7 * balanced + 1.1 * robust + reg)\n\n x0 = np.zeros(2 + 2 * G)\n bnds = [(-2.5, 2.5), (-4.0, 4.0)] + [(-1.5, 1.5)] * G + [(-2.5, 2.5)] * G\n res = optimize.minimize(objective, x0=x0, method=\"L-BFGS-B\", bounds=bnds)\n theta = res.x if res.success else x0\n _, _, _, ga, gb = unpack(theta)\n self.global_a_ = float(np.exp(np.clip(theta[0], -2.5, 2.5)))\n self.global_b_ = float(np.clip(theta[1], -4.0, 4.0))\n self.group_a_ = {int(k): float(ga[i]) for i, k in enumerate(uniq)}\n self.group_b_ = {int(k): float(gb[i]) for i, k in enumerate(uniq)}\n self.group_mix_ = {int(k): float(cnt[i] / (cnt[i] + tau_mix)) for i, k in enumerate(uniq)}\n return self\n\n def predict_proba(self, probs, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n z = special.logit(p)\n base = special.expit(self.global_a_ * z + self.global_b_)\n if groups is None:\n return np.clip(base, self.eps, 1.0 - self.eps)\n g = np.asarray(groups).reshape(-1).astype(int)\n out = np.array(base, copy=True)\n for k in np.unique(g):\n m = g == k\n a = self.group_a_.get(int(k), self.global_a_)\n b = self.group_b_.get(int(k), self.global_b_)\n w = self.group_mix_.get(int(k), 0.0)\n out[m] = (1.0 - w) * base[m] + w * special.expit(a * z[m] + b)\n return np.clip(out, self.eps, 1.0 - self.eps)",
5 "after_line": 91,
6 "start_line": 72,
7 "end_line": 91
8}ERROR: File not editable: scikitlearn/custom_subgroup_calibration.py
Tool Input
1{
2 "op": "replace",
3 "filename": "scikitlearn/scikit-learn/custom_subgroup_calibration.py",
4 "content": "class CalibrationMethod:\n \"\"\"Balanced hierarchical subgroup logit calibration with robust shrinkage.\"\"\"\n\n def __init__(self):\n self.eps = 1e-6\n self.global_a_, self.global_b_ = 1.0, 0.0\n self.group_a_, self.group_b_, self.group_mix_ = {}, {}, {}\n\n def fit(self, probs, labels, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n y = np.asarray(labels).reshape(-1).astype(int)\n g = np.zeros_like(y) if groups is None else np.asarray(groups).reshape(-1).astype(int)\n z = special.logit(p)\n uniq, inv = np.unique(g, return_inverse=True)\n cnt = np.bincount(inv).astype(float)\n shrink = np.sqrt(np.median(cnt) / np.maximum(cnt, 1.0))\n tau_mix, G = 20.0, len(uniq)\n\n def unpack(theta):\n raw_da = theta[2 : 2 + G]\n raw_db = theta[2 + G : 2 + 2 * G]\n da = raw_da - np.average(raw_da, weights=cnt)\n db = raw_db - np.average(raw_db, weights=cnt)\n ga = np.exp(np.clip(theta[0] + da, -2.5, 2.5))\n gb = np.clip(theta[1] + db, -4.0, 4.0)\n base = special.expit(np.exp(np.clip(theta[0], -2.5, 2.5)) * z + np.clip(theta[1], -4.0, 4.0))\n grp = special.expit(ga[inv] * z + gb[inv])\n mix = cnt[inv] / (cnt[inv] + tau_mix)\n pred = np.clip((1.0 - mix) * base + mix * grp, self.eps, 1.0 - self.eps)\n return pred, da, db, ga, gb\n\n def objective(theta):\n pred, da, db, _, _ = unpack(theta)\n sq = (pred - y) ** 2\n gbrier = np.array([sq[inv == i].mean() for i in range(G)])\n ggap = np.array([abs(pred[inv == i].mean() - y[inv == i].mean()) for i in range(G)])\n robust = special.logsumexp(18.0 * gbrier) / 18.0 + 0.7 * special.logsumexp(28.0 * ggap) / 28.0\n balanced = gbrier.mean() + 0.5 * ggap.mean()\n reg = 0.01 * (theta[0] ** 2 + 0.25 * theta[1] ** 2) + 0.05 * np.mean(shrink * (da**2 + 0.25 * db**2))\n return float(sq.mean() + 0.7 * balanced + 1.1 * robust + reg)\n\n x0 = np.zeros(2 + 2 * G)\n bnds = [(-2.5, 2.5), (-4.0, 4.0)] + [(-1.5, 1.5)] * G + [(-2.5, 2.5)] * G\n res = optimize.minimize(objective, x0=x0, method=\"L-BFGS-B\", bounds=bnds)\n theta = res.x if res.success else x0\n _, _, _, ga, gb = unpack(theta)\n self.global_a_ = float(np.exp(np.clip(theta[0], -2.5, 2.5)))\n self.global_b_ = float(np.clip(theta[1], -4.0, 4.0))\n self.group_a_ = {int(k): float(ga[i]) for i, k in enumerate(uniq)}\n self.group_b_ = {int(k): float(gb[i]) for i, k in enumerate(uniq)}\n self.group_mix_ = {int(k): float(cnt[i] / (cnt[i] + tau_mix)) for i, k in enumerate(uniq)}\n return self\n\n def predict_proba(self, probs, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n z = special.logit(p)\n base = special.expit(self.global_a_ * z + self.global_b_)\n if groups is None:\n return np.clip(base, self.eps, 1.0 - self.eps)\n g = np.asarray(groups).reshape(-1).astype(int)\n out = np.array(base, copy=True)\n for k in np.unique(g):\n m = g == k\n a = self.group_a_.get(int(k), self.global_a_)\n b = self.group_b_.get(int(k), self.global_b_)\n w = self.group_mix_.get(int(k), 0.0)\n out[m] = (1.0 - w) * base[m] + w * special.expit(a * z[m] + b)\n return np.clip(out, self.eps, 1.0 - self.eps)",
5 "after_line": 91,
6 "start_line": 72,
7 "end_line": 91
8}ERROR: File not editable: scikitlearn/scikit-learn/custom_subgroup_calibration.py
Tool Input
1{
2 "op": "replace",
3 "filename": "scikitlearn/scikit_learn/custom_subgroup_calibration.py",
4 "content": "class CalibrationMethod:\n \"\"\"Balanced hierarchical subgroup logit calibration with robust shrinkage.\"\"\"\n\n def __init__(self):\n self.eps = 1e-6\n self.global_a_, self.global_b_ = 1.0, 0.0\n self.group_a_, self.group_b_, self.group_mix_ = {}, {}, {}\n\n def fit(self, probs, labels, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n y = np.asarray(labels).reshape(-1).astype(int)\n g = np.zeros_like(y) if groups is None else np.asarray(groups).reshape(-1).astype(int)\n z = special.logit(p)\n uniq, inv = np.unique(g, return_inverse=True)\n cnt = np.bincount(inv).astype(float)\n shrink = np.sqrt(np.median(cnt) / np.maximum(cnt, 1.0))\n tau_mix, G = 20.0, len(uniq)\n\n def unpack(theta):\n raw_da = theta[2 : 2 + G]\n raw_db = theta[2 + G : 2 + 2 * G]\n da = raw_da - np.average(raw_da, weights=cnt)\n db = raw_db - np.average(raw_db, weights=cnt)\n ga = np.exp(np.clip(theta[0] + da, -2.5, 2.5))\n gb = np.clip(theta[1] + db, -4.0, 4.0)\n base = special.expit(np.exp(np.clip(theta[0], -2.5, 2.5)) * z + np.clip(theta[1], -4.0, 4.0))\n grp = special.expit(ga[inv] * z + gb[inv])\n mix = cnt[inv] / (cnt[inv] + tau_mix)\n pred = np.clip((1.0 - mix) * base + mix * grp, self.eps, 1.0 - self.eps)\n return pred, da, db, ga, gb\n\n def objective(theta):\n pred, da, db, _, _ = unpack(theta)\n sq = (pred - y) ** 2\n gbrier = np.array([sq[inv == i].mean() for i in range(G)])\n ggap = np.array([abs(pred[inv == i].mean() - y[inv == i].mean()) for i in range(G)])\n robust = special.logsumexp(18.0 * gbrier) / 18.0 + 0.7 * special.logsumexp(28.0 * ggap) / 28.0\n balanced = gbrier.mean() + 0.5 * ggap.mean()\n reg = 0.01 * (theta[0] ** 2 + 0.25 * theta[1] ** 2) + 0.05 * np.mean(shrink * (da**2 + 0.25 * db**2))\n return float(sq.mean() + 0.7 * balanced + 1.1 * robust + reg)\n\n x0 = np.zeros(2 + 2 * G)\n bnds = [(-2.5, 2.5), (-4.0, 4.0)] + [(-1.5, 1.5)] * G + [(-2.5, 2.5)] * G\n res = optimize.minimize(objective, x0=x0, method=\"L-BFGS-B\", bounds=bnds)\n theta = res.x if res.success else x0\n _, _, _, ga, gb = unpack(theta)\n self.global_a_ = float(np.exp(np.clip(theta[0], -2.5, 2.5)))\n self.global_b_ = float(np.clip(theta[1], -4.0, 4.0))\n self.group_a_ = {int(k): float(ga[i]) for i, k in enumerate(uniq)}\n self.group_b_ = {int(k): float(gb[i]) for i, k in enumerate(uniq)}\n self.group_mix_ = {int(k): float(cnt[i] / (cnt[i] + tau_mix)) for i, k in enumerate(uniq)}\n return self\n\n def predict_proba(self, probs, groups=None):\n p = np.clip(np.asarray(probs).reshape(-1), self.eps, 1.0 - self.eps)\n z = special.logit(p)\n base = special.expit(self.global_a_ * z + self.global_b_)\n if groups is None:\n return np.clip(base, self.eps, 1.0 - self.eps)\n g = np.asarray(groups).reshape(-1).astype(int)\n out = np.array(base, copy=True)\n for k in np.unique(g):\n m = g == k\n a = self.group_a_.get(int(k), self.global_a_)\n b = self.group_b_.get(int(k), self.global_b_)\n w = self.group_mix_.get(int(k), 0.0)\n out[m] = (1.0 - w) * base[m] + w * special.expit(a * z[m] + b)\n return np.clip(out, self.eps, 1.0 - self.eps)",
5 "after_line": 91,
6 "start_line": 72,
7 "end_line": 91
8}ERROR: File not editable: scikitlearn/scikit_learn/custom_subgroup_calibration.py