Agent Conversation: openai/gpt-5.4
Task: ml-active-learning
Active Learning: Query Strategy Design
Research Question
Design a novel pool-based active learning query strategy that outperforms existing methods (uncertainty sampling, entropy sampling, BADGE, BAIT, BALD) across diverse tabular classification datasets.
Background
Active learning aims to minimize labeling cost by intelligently selecting which unlabeled samples to query for labels. In pool-based active learning, a query strategy selects batches of samples from an unlabeled pool to be labeled by an oracle, then the model is retrained. The goal is to achieve the highest possible accuracy with the fewest labeled samples.
Classic approaches include:
- Uncertainty Sampling: Select samples where the model is least confident (lowest max predicted probability)
- Entropy Sampling: Select samples with highest predictive entropy
- Query By Committee: Select samples with maximal disagreement among an ensemble
Modern approaches incorporate diversity and information-theoretic principles:
- BADGE (Ash et al., ICLR 2020): Uses gradient embeddings with k-means++ for diverse, uncertain batch selection
- BAIT (Ash et al., NeurIPS 2021): Optimizes Fisher information to select maximally informative batches
- BALD (Houlsby et al., 2011): Uses MC Dropout to estimate mutual information between predictions and parameters
Task
Modify the CustomSampling class in badge/query_strategies/custom_sampling.py to implement a novel query strategy. The strategy must implement the query(n) method that returns n indices from the unlabeled pool.
Interface
class CustomSampling(Strategy):
def __init__(self, X, Y, idxs_lb, net, handler, args):
super().__init__(X, Y, idxs_lb, net, handler, args)
def query(self, n) -> np.ndarray:
# Must return n indices into self.X of unlabeled samples to label
...
Available from the Strategy base class:
self.X: pool features (numpy array, shape[n_pool, n_features])self.Y: pool labels (torch LongTensor, shape[n_pool])self.idxs_lb: boolean mask of labeled samplesself.n_pool: total pool sizeself.predict_prob(X, Y): softmax probabilities[len(X), n_classes]self.predict_prob_dropout_split(X, Y, n_drop): MC dropout probs[n_drop, len(X), n_classes]self.get_embedding(X, Y): penultimate-layer embeddings[len(X), emb_dim]self.get_grad_embedding(X, Y): gradient embeddings[len(X), emb_dim * n_classes]self.get_exp_grad_embedding(X, Y): expected Fisher embeddings[len(X), n_classes, emb_dim]
Evaluation
- Datasets: 3 OpenML tabular classification datasets (letter recognition, spambase, splice)
- Protocol: 20 rounds of batch active learning, evaluated after each round
- Metrics:
accuracy: Test accuracy at the end of 20 AL rounds (fixed label budget)auc: Area under the learning curve (accuracy vs. number of labeled samples), measuring sample efficiency
Tool Input
1{
2 "op": "replace",
3 "filename": "badge/query_strategies/custom_sampling.py",
4 "content": "class CustomSampling(Strategy):\n \"\"\"Rarity-weighted density-diversified active learning.\n\n Scores unlabeled points by predictive uncertainty tempered by under-covered\n classes (estimated from the labeled set), then greedily builds a diverse\n batch in embedding space while penalizing isolated outliers via local\n density.\"\"\"\n\n def __init__(self, X, Y, idxs_lb, net, handler, args):\n super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n\n def query(self, n):\n to_np = lambda z: z.detach().cpu().numpy() if hasattr(z, 'detach') else (z.cpu().numpy() if hasattr(z, 'cpu') else np.asarray(z))\n idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n if len(idxs_unlabeled) <= n:\n return idxs_unlabeled\n Y = to_np(self.Y).astype(int)\n probs = to_np(self.predict_prob(self.X[idxs_unlabeled], Y[idxs_unlabeled])).astype(float)\n entropy = -(probs * np.log(probs + 1e-12)).sum(1) / max(np.log(probs.shape[1]), 1e-12)\n cls_cnt = np.bincount(Y[self.idxs_lb], minlength=probs.shape[1]).astype(float)\n rarity = probs @ (1.0 / np.sqrt(cls_cnt + 1.0))\n rarity /= rarity.max() + 1e-12\n score = entropy * (0.5 + 0.5 * rarity)\n m = min(len(idxs_unlabeled), max(8 * n, 256))\n cand = np.argpartition(score, -m)[-m:]\n cand = cand[np.argsort(score[cand])[::-1]]\n emb = to_np(self.get_embedding(self.X[idxs_unlabeled[cand]], Y[idxs_unlabeled[cand]])).astype(float)\n emb /= np.linalg.norm(emb, axis=1, keepdims=True) + 1e-12\n D = ((emb[:, None, :] - emb[None, :, :]) ** 2).sum(-1)\n k = min(10, max(1, len(cand) - 1))\n density = 1.0 / (np.sort(D + np.eye(len(cand)) * 1e9, axis=1)[:, :k].mean(1) + 1e-6)\n density /= density.max() + 1e-12\n base = score[cand] * (0.35 + 0.65 * density)\n chosen = [int(np.argmax(base))]\n min_d = D[chosen[0]].copy()\n scale = np.median(D[D > 0]) + 1e-12 if np.any(D > 0) else 1.0\n for _ in range(1, min(n, len(cand))):\n util = base * (1.0 + min_d / scale)\n util[chosen] = -1\n j = int(np.argmax(util))\n chosen.append(j)\n min_d = np.minimum(min_d, D[j])\n return idxs_unlabeled[cand[np.asarray(chosen, dtype=int)]]",
5 "after_line": 0,
6 "start_line": 28,
7 "end_line": 54
8}OK: Replaced lines 28..54 with 43 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–70.
[Current file: badge/query_strategies/custom_sampling.py | editable: 28–70 | total: 70 lines]
28 class CustomSampling(Strategy):
29 """Rarity-weighted density-diversified active learning.
30
... (37 more lines) ...
68 chosen.append(j)
69 min_d = np.minimum(min_d, D[j])
70 return idxs_unlabeled[cand[np.asarray(chosen, dtype=int)]][Test #1] (2 tests remaining; call submit(n=N) to choose which test result to submit as final)
### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 230, in <module>
main()
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 81, in main
data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)
### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-le
Tool Input
1{
2 "op": "replace",
3 "filename": "badge/query_strategies/custom_sampling.py",
4 "content": "class CustomSampling(Strategy):\n \"\"\"Balanced gradient-diversity active learning.\n\n Uses uncertainty tempered by labeled-class scarcity, filters out isolated\n outliers with embedding-space density, and greedily assembles a batch that\n is diverse in gradient space while spreading queries across pseudo-classes.\"\"\"\n\n def __init__(self, X, Y, idxs_lb, net, handler, args):\n super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n\n def query(self, n):\n to_np = lambda z: z.detach().cpu().numpy() if hasattr(z, 'detach') else (z.cpu().numpy() if hasattr(z, 'cpu') else np.asarray(z))\n idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n if len(idxs_unlabeled) <= n:\n return idxs_unlabeled\n Y = to_np(self.Y).astype(int)\n probs = to_np(self.predict_prob(self.X[idxs_unlabeled], Y[idxs_unlabeled])).astype(float)\n entropy = -(probs * np.log(probs + 1e-12)).sum(1) / max(np.log(probs.shape[1]), 1e-12)\n cls_cnt = np.bincount(Y[self.idxs_lb], minlength=probs.shape[1]).astype(float)\n rarity = probs @ (1.0 / np.sqrt(cls_cnt + 1.0))\n rarity /= rarity.max() + 1e-12\n score = entropy * (0.4 + 0.6 * rarity)\n m = min(len(idxs_unlabeled), max(8 * n, 256))\n cand = np.argpartition(score, -m)[-m:]\n cand = cand[np.argsort(score[cand])[::-1]]\n pool = idxs_unlabeled[cand]\n emb = to_np(self.get_embedding(self.X[pool], Y[pool])).astype(float)\n grad = to_np(self.get_grad_embedding(self.X[pool], Y[pool])).astype(float)\n emb /= np.linalg.norm(emb, axis=1, keepdims=True) + 1e-12\n grad /= np.linalg.norm(grad, axis=1, keepdims=True) + 1e-12\n De = np.clip(2.0 - 2.0 * (emb @ emb.T), 0.0, None)\n Dg = np.clip(2.0 - 2.0 * (grad @ grad.T), 0.0, None)\n k = min(10, max(1, len(cand) - 1))\n density = 1.0 / (np.sort(De + np.eye(len(cand)) * 1e9, axis=1)[:, :k].mean(1) + 1e-6)\n density /= density.max() + 1e-12\n base = score[cand] * (0.3 + 0.7 * density)\n pred = probs[cand].argmax(1)\n cls_sel = np.zeros(probs.shape[1], dtype=float)\n min_d = np.full(len(cand), Dg.max() if np.any(Dg) else 1.0, dtype=float)\n scale = np.median(Dg[Dg > 0]) + 1e-12 if np.any(Dg > 0) else 1.0\n chosen = []\n for _ in range(min(n, len(cand))):\n util = base * (1.0 + min_d / scale) / np.sqrt(1.0 + cls_sel[pred])\n util[chosen] = -1\n j = int(np.argmax(util))\n chosen.append(j)\n cls_sel[pred[j]] += 1.0\n min_d = np.minimum(min_d, Dg[j])\n return pool[np.asarray(chosen, dtype=int)]",
5 "after_line": 0,
6 "start_line": 28,
7 "end_line": 70
8}OK: Replaced lines 28..70 with 49 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–76.
[Current file: badge/query_strategies/custom_sampling.py | editable: 28–76 | total: 76 lines]
28 class CustomSampling(Strategy):
29 """Balanced gradient-diversity active learning.
30
... (43 more lines) ...
74 cls_sel[pred[j]] += 1.0
75 min_d = np.minimum(min_d, Dg[j])
76 return pool[np.asarray(chosen, dtype=int)][Test #2] (1 test remaining; call submit(n=N) to choose which test result to submit as final)
### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 230, in <module>
main()
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 81, in main
data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)
### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-lea
Tool Input
1{
2 "n": 2
3}[submit] Submitting result from test #2 as final.
### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 230, in <module>
main()
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 81, in main
data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)
### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/r
Tool Input
1{
2 "op": "replace",
3 "filename": "badge/query_strategies/custom_sampling.py",
4 "content": "class CustomSampling(Strategy):\n \"\"\"Prototype-aware balanced gradient covering.\n\n Novel batch query rule for tabular AL:\n 1) score uncertainty with a class-rarity prior from the labeled set,\n 2) prefer candidates that are novel relative to labeled prototypes of their\n pseudo-class while remaining on dense regions of the pool,\n 3) greedily cover gradient space with a pseudo-class balancing penalty.\n \"\"\"\n\n def __init__(self, X, Y, idxs_lb, net, handler, args):\n super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n\n def query(self, n):\n to_np = lambda z: z.detach().cpu().numpy() if hasattr(z, 'detach') else (z.cpu().numpy() if hasattr(z, 'cpu') else np.asarray(z))\n idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n if len(idxs_unlabeled) <= n:\n return idxs_unlabeled\n Y = to_np(self.Y).astype(int)\n probs_u = to_np(self.predict_prob(self.X[idxs_unlabeled], Y[idxs_unlabeled])).astype(float)\n C = probs_u.shape[1]\n ent = -(probs_u * np.log(probs_u + 1e-12)).sum(1) / max(np.log(C), 1e-12)\n pred_u = probs_u.argmax(1)\n cnt = np.bincount(Y[self.idxs_lb], minlength=C).astype(float)\n rarity = probs_u @ (1.0 / np.sqrt(cnt + 1.0))\n rarity /= rarity.max() + 1e-12\n seed = ent * (0.45 + 0.55 * rarity)\n m = min(len(idxs_unlabeled), max(10 * n, 256))\n cand = np.argpartition(seed, -m)[-m:]\n cand = cand[np.argsort(seed[cand])[::-1]]\n pool = idxs_unlabeled[cand]\n emb_c = to_np(self.get_embedding(self.X[pool], Y[pool])).astype(float)\n emb_c /= np.linalg.norm(emb_c, axis=1, keepdims=True) + 1e-12\n pred_c = pred_u[cand]\n if np.any(self.idxs_lb):\n idxs_l = np.arange(self.n_pool)[self.idxs_lb]\n emb_l = to_np(self.get_embedding(self.X[idxs_l], Y[idxs_l])).astype(float)\n emb_l /= np.linalg.norm(emb_l, axis=1, keepdims=True) + 1e-12\n yl = Y[idxs_l]\n proto = np.zeros((C, emb_c.shape[1]), dtype=float)\n for c in range(C):\n msk = yl == c\n proto[c] = emb_l[msk].mean(0) if np.any(msk) else 0.0\n proto /= np.linalg.norm(proto, axis=1, keepdims=True) + 1e-12\n novelty = np.clip(1.0 - (emb_c * proto[pred_c]).sum(1), 0.0, None)\n else:\n novelty = np.ones(len(pool), dtype=float)\n De = np.clip(2.0 - 2.0 * (emb_c @ emb_c.T), 0.0, None)\n k = min(10, max(1, len(pool) - 1))\n density = 1.0 / (np.sort(De + np.eye(len(pool)) * 1e9, axis=1)[:, :k].mean(1) + 1e-6)\n density /= density.max() + 1e-12\n grad = to_np(self.get_grad_embedding(self.X[pool], Y[pool])).astype(float)\n grad /= np.linalg.norm(grad, axis=1, keepdims=True) + 1e-12\n Dg = np.clip(2.0 - 2.0 * (grad @ grad.T), 0.0, None)\n base = seed[cand] * (0.25 + 0.45 * density + 0.30 * novelty)\n chosen, cls_sel = [], np.zeros(C, dtype=float)\n min_d = np.full(len(pool), Dg.max() if np.any(Dg) else 1.0, dtype=float)\n scale = np.median(Dg[Dg > 0]) + 1e-12 if np.any(Dg > 0) else 1.0\n for _ in range(min(n, len(pool))):\n util = base * (1.0 + min_d / scale) / np.sqrt(1.0 + cls_sel[pred_c])\n util[chosen] = -1.0\n j = int(np.argmax(util))\n chosen.append(j)\n cls_sel[pred_c[j]] += 1.0\n min_d = np.minimum(min_d, Dg[j])\n return pool[np.asarray(chosen, dtype=int)]",
5 "after_line": 0,
6 "start_line": 28,
7 "end_line": 76
8}OK: Replaced lines 28..76 with 66 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–93.
[Current file: badge/query_strategies/custom_sampling.py | editable: 28–93 | total: 93 lines]
28 class CustomSampling(Strategy):
29 """Prototype-aware balanced gradient covering.
30
... (60 more lines) ...
91 cls_sel[pred_c[j]] += 1.0
92 min_d = np.minimum(min_d, Dg[j])
93 return pool[np.asarray(chosen, dtype=int)][Test #3] (0 tests remaining; call submit(n=N) to choose which test result to submit as final)
[NOTE] This was your last test. You MUST now call submit(n=X) to choose which test result to submit as your final answer.
### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 230, in <module>
main()
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 81, in main
data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)
### spambase (scripts/spambase.sh)
[C
Tool Input
1{
2 "n": 3
3}[submit] Submitting result from test #3 as final.
### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 230, in <module>
main()
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 81, in main
data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)
### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/r
Tool Input
1{
2 "op": "replace",
3 "filename": "badge/query_strategies/custom_sampling.py",
4 "content": "class CustomSampling(Strategy):\n \"\"\"Curriculum prototype-aware gradient covering.\n\n Query rule:\n 1) score uncertainty with a rarity prior from labeled class counts,\n 2) keep candidates that are both dense and novel relative to labeled\n prototypes of their pseudo-class,\n 3) greedily maximize gradient-space coverage with a pseudo-class balancing\n penalty, using an AL-stage curriculum that shifts from density early to\n prototype novelty later.\n \"\"\"\n\n def __init__(self, X, Y, idxs_lb, net, handler, args):\n super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n\n def query(self, n):\n to_np = lambda z: z.detach().cpu().numpy() if hasattr(z, 'detach') else (z.cpu().numpy() if hasattr(z, 'cpu') else np.asarray(z))\n idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n if len(idxs_unlabeled) <= n:\n return idxs_unlabeled\n Y = to_np(self.Y).astype(int)\n probs_u = to_np(self.predict_prob(self.X[idxs_unlabeled], Y[idxs_unlabeled])).astype(float)\n C = probs_u.shape[1]\n ent = -(probs_u * np.log(probs_u + 1e-12)).sum(1) / max(np.log(C), 1e-12)\n pred_u = probs_u.argmax(1)\n cnt = np.bincount(Y[self.idxs_lb], minlength=C).astype(float)\n rarity = probs_u @ (1.0 / np.sqrt(cnt + 1.0))\n rarity /= rarity.max() + 1e-12\n seed = ent * (0.45 + 0.55 * rarity)\n m = min(len(idxs_unlabeled), max(10 * n, 256))\n cand = np.argpartition(seed, -m)[-m:]\n cand = cand[np.argsort(seed[cand])[::-1]]\n pool = idxs_unlabeled[cand]\n emb_c = to_np(self.get_embedding(self.X[pool], Y[pool])).astype(float)\n emb_c /= np.linalg.norm(emb_c, axis=1, keepdims=True) + 1e-12\n pred_c = pred_u[cand]\n if np.any(self.idxs_lb):\n idxs_l = np.arange(self.n_pool)[self.idxs_lb]\n emb_l = to_np(self.get_embedding(self.X[idxs_l], Y[idxs_l])).astype(float)\n emb_l /= np.linalg.norm(emb_l, axis=1, keepdims=True) + 1e-12\n yl = Y[idxs_l]\n proto = np.zeros((C, emb_c.shape[1]), dtype=float)\n for c in range(C):\n msk = yl == c\n proto[c] = emb_l[msk].mean(0) if np.any(msk) else 0.0\n proto /= np.linalg.norm(proto, axis=1, keepdims=True) + 1e-12\n novelty = np.clip(1.0 - (emb_c * proto[pred_c]).sum(1), 0.0, None)\n else:\n novelty = np.ones(len(pool), dtype=float)\n De = np.clip(2.0 - 2.0 * (emb_c @ emb_c.T), 0.0, None)\n k = min(10, max(1, len(pool) - 1))\n density = 1.0 / (np.sort(De + np.eye(len(pool)) * 1e9, axis=1)[:, :k].mean(1) + 1e-6)\n density /= density.max() + 1e-12\n frac = float(np.mean(self.idxs_lb))\n early = max(0.0, 1.0 - 3.0 * frac)\n base = seed[cand] * (0.20 + (0.55 - 0.25 * early) * density + (0.25 + 0.25 * early) * novelty)\n grad = to_np(self.get_grad_embedding(self.X[pool], Y[pool])).astype(float)\n grad /= np.linalg.norm(grad, axis=1, keepdims=True) + 1e-12\n Dg = np.clip(2.0 - 2.0 * (grad @ grad.T), 0.0, None)\n chosen, cls_sel = [], np.zeros(C, dtype=float)\n min_d = np.full(len(pool), Dg.max() if np.any(Dg) else 1.0, dtype=float)\n scale = np.median(Dg[Dg > 0]) + 1e-12 if np.any(Dg > 0) else 1.0\n for _ in range(min(n, len(pool))):\n util = base * (1.0 + min_d / scale) / np.sqrt(1.0 + cls_sel[pred_c])\n util[chosen] = -1.0\n j = int(np.argmax(util))\n chosen.append(j)\n cls_sel[pred_c[j]] += 1.0\n min_d = np.minimum(min_d, Dg[j])\n return pool[np.asarray(chosen, dtype=int)]",
5 "after_line": 0,
6 "start_line": 28,
7 "end_line": 93
8}OK: Replaced lines 28..93 with 70 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–97.
[Current file: badge/query_strategies/custom_sampling.py | editable: 28–97 | total: 97 lines]
28 class CustomSampling(Strategy):
29 """Curriculum prototype-aware gradient covering.
30
... (64 more lines) ...
95 cls_sel[pred_c[j]] += 1.0
96 min_d = np.minimum(min_d, Dg[j])
97 return pool[np.asarray(chosen, dtype=int)]