Agent Conversation: openai/gpt-5.4

Model: openai/gpt-5.4
Total steps: 20
Messages: 46
UserStep 0

Task: ml-active-learning

Active Learning: Query Strategy Design

Research Question

Design a novel pool-based active learning query strategy that outperforms existing methods (uncertainty sampling, entropy sampling, BADGE, BAIT, BALD) across diverse tabular classification datasets.

Background

Active learning aims to minimize labeling cost by intelligently selecting which unlabeled samples to query for labels. In pool-based active learning, a query strategy selects batches of samples from an unlabeled pool to be labeled by an oracle, then the model is retrained. The goal is to achieve the highest possible accuracy with the fewest labeled samples.

Classic approaches include:

  • Uncertainty Sampling: Select samples where the model is least confident (lowest max predicted probability)
  • Entropy Sampling: Select samples with highest predictive entropy
  • Query By Committee: Select samples with maximal disagreement among an ensemble

Modern approaches incorporate diversity and information-theoretic principles:

  • BADGE (Ash et al., ICLR 2020): Uses gradient embeddings with k-means++ for diverse, uncertain batch selection
  • BAIT (Ash et al., NeurIPS 2021): Optimizes Fisher information to select maximally informative batches
  • BALD (Houlsby et al., 2011): Uses MC Dropout to estimate mutual information between predictions and parameters

Task

Modify the CustomSampling class in badge/query_strategies/custom_sampling.py to implement a novel query strategy. The strategy must implement the query(n) method that returns n indices from the unlabeled pool.

Interface

class CustomSampling(Strategy):
    def __init__(self, X, Y, idxs_lb, net, handler, args):
        super().__init__(X, Y, idxs_lb, net, handler, args)

    def query(self, n) -> np.ndarray:
        # Must return n indices into self.X of unlabeled samples to label
        ...

Available from the Strategy base class:

  • self.X: pool features (numpy array, shape [n_pool, n_features])
  • self.Y: pool labels (torch LongTensor, shape [n_pool])
  • self.idxs_lb: boolean mask of labeled samples
  • self.n_pool: total pool size
  • self.predict_prob(X, Y): softmax probabilities [len(X), n_classes]
  • self.predict_prob_dropout_split(X, Y, n_drop): MC dropout probs [n_drop, len(X), n_classes]
  • self.get_embedding(X, Y): penultimate-layer embeddings [len(X), emb_dim]
  • self.get_grad_embedding(X, Y): gradient embeddings [len(X), emb_dim * n_classes]
  • self.get_exp_grad_embedding(X, Y): expected Fisher embeddings [len(X), n_classes, emb_dim]

Evaluation

  • Datasets: 3 OpenML tabular classification datasets (letter recognition, spambase, splice)
  • Protocol: 20 rounds of batch active learning, evaluated after each round
  • Metrics:
    • accuracy: Test accuracy at the end of 20 AL rounds (fixed label budget)
    • auc: Area under the learning curve (accuracy vs. number of labeled samples), measuring sample efficiency
AssistantStep 1edit
Tool Input
1{
2  "op": "replace",
3  "filename": "badge/query_strategies/custom_sampling.py",
4  "content": "class CustomSampling(Strategy):\n    \"\"\"Rarity-weighted density-diversified active learning.\n\n    Scores unlabeled points by predictive uncertainty tempered by under-covered\n    classes (estimated from the labeled set), then greedily builds a diverse\n    batch in embedding space while penalizing isolated outliers via local\n    density.\"\"\"\n\n    def __init__(self, X, Y, idxs_lb, net, handler, args):\n        super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n\n    def query(self, n):\n        to_np = lambda z: z.detach().cpu().numpy() if hasattr(z, 'detach') else (z.cpu().numpy() if hasattr(z, 'cpu') else np.asarray(z))\n        idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n        if len(idxs_unlabeled) <= n:\n            return idxs_unlabeled\n        Y = to_np(self.Y).astype(int)\n        probs = to_np(self.predict_prob(self.X[idxs_unlabeled], Y[idxs_unlabeled])).astype(float)\n        entropy = -(probs * np.log(probs + 1e-12)).sum(1) / max(np.log(probs.shape[1]), 1e-12)\n        cls_cnt = np.bincount(Y[self.idxs_lb], minlength=probs.shape[1]).astype(float)\n        rarity = probs @ (1.0 / np.sqrt(cls_cnt + 1.0))\n        rarity /= rarity.max() + 1e-12\n        score = entropy * (0.5 + 0.5 * rarity)\n        m = min(len(idxs_unlabeled), max(8 * n, 256))\n        cand = np.argpartition(score, -m)[-m:]\n        cand = cand[np.argsort(score[cand])[::-1]]\n        emb = to_np(self.get_embedding(self.X[idxs_unlabeled[cand]], Y[idxs_unlabeled[cand]])).astype(float)\n        emb /= np.linalg.norm(emb, axis=1, keepdims=True) + 1e-12\n        D = ((emb[:, None, :] - emb[None, :, :]) ** 2).sum(-1)\n        k = min(10, max(1, len(cand) - 1))\n        density = 1.0 / (np.sort(D + np.eye(len(cand)) * 1e9, axis=1)[:, :k].mean(1) + 1e-6)\n        density /= density.max() + 1e-12\n        base = score[cand] * (0.35 + 0.65 * density)\n        chosen = [int(np.argmax(base))]\n        min_d = D[chosen[0]].copy()\n        scale = np.median(D[D > 0]) + 1e-12 if np.any(D > 0) else 1.0\n        for _ in range(1, min(n, len(cand))):\n            util = base * (1.0 + min_d / scale)\n            util[chosen] = -1\n            j = int(np.argmax(util))\n            chosen.append(j)\n            min_d = np.minimum(min_d, D[j])\n        return idxs_unlabeled[cand[np.asarray(chosen, dtype=int)]]",
5  "after_line": 0,
6  "start_line": 28,
7  "end_line": 54
8}
Tool ResultStep 1
OK: Replaced lines 28..54 with 43 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–70.

[Current file: badge/query_strategies/custom_sampling.py | editable: 28–70 | total: 70 lines]
    28  class CustomSampling(Strategy):
    29      """Rarity-weighted density-diversified active learning.
    30  
       ... (37 more lines) ...
    68              chosen.append(j)
    69              min_d = np.minimum(min_d, D[j])
    70          return idxs_unlabeled[cand[np.asarray(chosen, dtype=int)]]
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=N) to choose which test result to submit as final)

### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 230, in <module>
    main()
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 81, in main
    data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)


### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-le
AssistantStep 3edit
Tool Input
1{
2  "op": "replace",
3  "filename": "badge/query_strategies/custom_sampling.py",
4  "content": "class CustomSampling(Strategy):\n    \"\"\"Balanced gradient-diversity active learning.\n\n    Uses uncertainty tempered by labeled-class scarcity, filters out isolated\n    outliers with embedding-space density, and greedily assembles a batch that\n    is diverse in gradient space while spreading queries across pseudo-classes.\"\"\"\n\n    def __init__(self, X, Y, idxs_lb, net, handler, args):\n        super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n\n    def query(self, n):\n        to_np = lambda z: z.detach().cpu().numpy() if hasattr(z, 'detach') else (z.cpu().numpy() if hasattr(z, 'cpu') else np.asarray(z))\n        idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n        if len(idxs_unlabeled) <= n:\n            return idxs_unlabeled\n        Y = to_np(self.Y).astype(int)\n        probs = to_np(self.predict_prob(self.X[idxs_unlabeled], Y[idxs_unlabeled])).astype(float)\n        entropy = -(probs * np.log(probs + 1e-12)).sum(1) / max(np.log(probs.shape[1]), 1e-12)\n        cls_cnt = np.bincount(Y[self.idxs_lb], minlength=probs.shape[1]).astype(float)\n        rarity = probs @ (1.0 / np.sqrt(cls_cnt + 1.0))\n        rarity /= rarity.max() + 1e-12\n        score = entropy * (0.4 + 0.6 * rarity)\n        m = min(len(idxs_unlabeled), max(8 * n, 256))\n        cand = np.argpartition(score, -m)[-m:]\n        cand = cand[np.argsort(score[cand])[::-1]]\n        pool = idxs_unlabeled[cand]\n        emb = to_np(self.get_embedding(self.X[pool], Y[pool])).astype(float)\n        grad = to_np(self.get_grad_embedding(self.X[pool], Y[pool])).astype(float)\n        emb /= np.linalg.norm(emb, axis=1, keepdims=True) + 1e-12\n        grad /= np.linalg.norm(grad, axis=1, keepdims=True) + 1e-12\n        De = np.clip(2.0 - 2.0 * (emb @ emb.T), 0.0, None)\n        Dg = np.clip(2.0 - 2.0 * (grad @ grad.T), 0.0, None)\n        k = min(10, max(1, len(cand) - 1))\n        density = 1.0 / (np.sort(De + np.eye(len(cand)) * 1e9, axis=1)[:, :k].mean(1) + 1e-6)\n        density /= density.max() + 1e-12\n        base = score[cand] * (0.3 + 0.7 * density)\n        pred = probs[cand].argmax(1)\n        cls_sel = np.zeros(probs.shape[1], dtype=float)\n        min_d = np.full(len(cand), Dg.max() if np.any(Dg) else 1.0, dtype=float)\n        scale = np.median(Dg[Dg > 0]) + 1e-12 if np.any(Dg > 0) else 1.0\n        chosen = []\n        for _ in range(min(n, len(cand))):\n            util = base * (1.0 + min_d / scale) / np.sqrt(1.0 + cls_sel[pred])\n            util[chosen] = -1\n            j = int(np.argmax(util))\n            chosen.append(j)\n            cls_sel[pred[j]] += 1.0\n            min_d = np.minimum(min_d, Dg[j])\n        return pool[np.asarray(chosen, dtype=int)]",
5  "after_line": 0,
6  "start_line": 28,
7  "end_line": 70
8}
Tool ResultStep 3
OK: Replaced lines 28..70 with 49 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–76.

[Current file: badge/query_strategies/custom_sampling.py | editable: 28–76 | total: 76 lines]
    28  class CustomSampling(Strategy):
    29      """Balanced gradient-diversity active learning.
    30  
       ... (43 more lines) ...
    74              cls_sel[pred[j]] += 1.0
    75              min_d = np.minimum(min_d, Dg[j])
    76          return pool[np.asarray(chosen, dtype=int)]
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=N) to choose which test result to submit as final)

### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 230, in <module>
    main()
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 81, in main
    data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)


### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-lea
AssistantStep 5submit
Tool Input
1{
2  "n": 2
3}
Tool ResultStep 5
[submit] Submitting result from test #2 as final.

### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 230, in <module>
    main()
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 81, in main
    data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)


### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/r
AssistantStep 5edit
Tool Input
1{
2  "op": "replace",
3  "filename": "badge/query_strategies/custom_sampling.py",
4  "content": "class CustomSampling(Strategy):\n    \"\"\"Prototype-aware balanced gradient covering.\n\n    Novel batch query rule for tabular AL:\n    1) score uncertainty with a class-rarity prior from the labeled set,\n    2) prefer candidates that are novel relative to labeled prototypes of their\n       pseudo-class while remaining on dense regions of the pool,\n    3) greedily cover gradient space with a pseudo-class balancing penalty.\n    \"\"\"\n\n    def __init__(self, X, Y, idxs_lb, net, handler, args):\n        super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n\n    def query(self, n):\n        to_np = lambda z: z.detach().cpu().numpy() if hasattr(z, 'detach') else (z.cpu().numpy() if hasattr(z, 'cpu') else np.asarray(z))\n        idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n        if len(idxs_unlabeled) <= n:\n            return idxs_unlabeled\n        Y = to_np(self.Y).astype(int)\n        probs_u = to_np(self.predict_prob(self.X[idxs_unlabeled], Y[idxs_unlabeled])).astype(float)\n        C = probs_u.shape[1]\n        ent = -(probs_u * np.log(probs_u + 1e-12)).sum(1) / max(np.log(C), 1e-12)\n        pred_u = probs_u.argmax(1)\n        cnt = np.bincount(Y[self.idxs_lb], minlength=C).astype(float)\n        rarity = probs_u @ (1.0 / np.sqrt(cnt + 1.0))\n        rarity /= rarity.max() + 1e-12\n        seed = ent * (0.45 + 0.55 * rarity)\n        m = min(len(idxs_unlabeled), max(10 * n, 256))\n        cand = np.argpartition(seed, -m)[-m:]\n        cand = cand[np.argsort(seed[cand])[::-1]]\n        pool = idxs_unlabeled[cand]\n        emb_c = to_np(self.get_embedding(self.X[pool], Y[pool])).astype(float)\n        emb_c /= np.linalg.norm(emb_c, axis=1, keepdims=True) + 1e-12\n        pred_c = pred_u[cand]\n        if np.any(self.idxs_lb):\n            idxs_l = np.arange(self.n_pool)[self.idxs_lb]\n            emb_l = to_np(self.get_embedding(self.X[idxs_l], Y[idxs_l])).astype(float)\n            emb_l /= np.linalg.norm(emb_l, axis=1, keepdims=True) + 1e-12\n            yl = Y[idxs_l]\n            proto = np.zeros((C, emb_c.shape[1]), dtype=float)\n            for c in range(C):\n                msk = yl == c\n                proto[c] = emb_l[msk].mean(0) if np.any(msk) else 0.0\n            proto /= np.linalg.norm(proto, axis=1, keepdims=True) + 1e-12\n            novelty = np.clip(1.0 - (emb_c * proto[pred_c]).sum(1), 0.0, None)\n        else:\n            novelty = np.ones(len(pool), dtype=float)\n        De = np.clip(2.0 - 2.0 * (emb_c @ emb_c.T), 0.0, None)\n        k = min(10, max(1, len(pool) - 1))\n        density = 1.0 / (np.sort(De + np.eye(len(pool)) * 1e9, axis=1)[:, :k].mean(1) + 1e-6)\n        density /= density.max() + 1e-12\n        grad = to_np(self.get_grad_embedding(self.X[pool], Y[pool])).astype(float)\n        grad /= np.linalg.norm(grad, axis=1, keepdims=True) + 1e-12\n        Dg = np.clip(2.0 - 2.0 * (grad @ grad.T), 0.0, None)\n        base = seed[cand] * (0.25 + 0.45 * density + 0.30 * novelty)\n        chosen, cls_sel = [], np.zeros(C, dtype=float)\n        min_d = np.full(len(pool), Dg.max() if np.any(Dg) else 1.0, dtype=float)\n        scale = np.median(Dg[Dg > 0]) + 1e-12 if np.any(Dg > 0) else 1.0\n        for _ in range(min(n, len(pool))):\n            util = base * (1.0 + min_d / scale) / np.sqrt(1.0 + cls_sel[pred_c])\n            util[chosen] = -1.0\n            j = int(np.argmax(util))\n            chosen.append(j)\n            cls_sel[pred_c[j]] += 1.0\n            min_d = np.minimum(min_d, Dg[j])\n        return pool[np.asarray(chosen, dtype=int)]",
5  "after_line": 0,
6  "start_line": 28,
7  "end_line": 76
8}
Tool ResultStep 5
OK: Replaced lines 28..76 with 66 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–93.

[Current file: badge/query_strategies/custom_sampling.py | editable: 28–93 | total: 93 lines]
    28  class CustomSampling(Strategy):
    29      """Prototype-aware balanced gradient covering.
    30  
       ... (60 more lines) ...
    91              cls_sel[pred_c[j]] += 1.0
    92              min_d = np.minimum(min_d, Dg[j])
    93          return pool[np.asarray(chosen, dtype=int)]
AssistantStep 6test
Tool ResultStep 6
[Test #3] (0 tests remaining; call submit(n=N) to choose which test result to submit as final)

[NOTE] This was your last test. You MUST now call submit(n=X) to choose which test result to submit as your final answer.

### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 230, in <module>
    main()
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 81, in main
    data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)


### spambase (scripts/spambase.sh)
[C
AssistantStep 7submit
Tool Input
1{
2  "n": 3
3}
Tool ResultStep 7
[submit] Submitting result from test #3 as final.

### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 230, in <module>
    main()
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/run_al.py", line 81, in main
    data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)


### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/openai_gpt-5.4_20260414_110948/badge/r
AssistantStep 7edit
Tool Input
1{
2  "op": "replace",
3  "filename": "badge/query_strategies/custom_sampling.py",
4  "content": "class CustomSampling(Strategy):\n    \"\"\"Curriculum prototype-aware gradient covering.\n\n    Query rule:\n    1) score uncertainty with a rarity prior from labeled class counts,\n    2) keep candidates that are both dense and novel relative to labeled\n       prototypes of their pseudo-class,\n    3) greedily maximize gradient-space coverage with a pseudo-class balancing\n       penalty, using an AL-stage curriculum that shifts from density early to\n       prototype novelty later.\n    \"\"\"\n\n    def __init__(self, X, Y, idxs_lb, net, handler, args):\n        super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n\n    def query(self, n):\n        to_np = lambda z: z.detach().cpu().numpy() if hasattr(z, 'detach') else (z.cpu().numpy() if hasattr(z, 'cpu') else np.asarray(z))\n        idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n        if len(idxs_unlabeled) <= n:\n            return idxs_unlabeled\n        Y = to_np(self.Y).astype(int)\n        probs_u = to_np(self.predict_prob(self.X[idxs_unlabeled], Y[idxs_unlabeled])).astype(float)\n        C = probs_u.shape[1]\n        ent = -(probs_u * np.log(probs_u + 1e-12)).sum(1) / max(np.log(C), 1e-12)\n        pred_u = probs_u.argmax(1)\n        cnt = np.bincount(Y[self.idxs_lb], minlength=C).astype(float)\n        rarity = probs_u @ (1.0 / np.sqrt(cnt + 1.0))\n        rarity /= rarity.max() + 1e-12\n        seed = ent * (0.45 + 0.55 * rarity)\n        m = min(len(idxs_unlabeled), max(10 * n, 256))\n        cand = np.argpartition(seed, -m)[-m:]\n        cand = cand[np.argsort(seed[cand])[::-1]]\n        pool = idxs_unlabeled[cand]\n        emb_c = to_np(self.get_embedding(self.X[pool], Y[pool])).astype(float)\n        emb_c /= np.linalg.norm(emb_c, axis=1, keepdims=True) + 1e-12\n        pred_c = pred_u[cand]\n        if np.any(self.idxs_lb):\n            idxs_l = np.arange(self.n_pool)[self.idxs_lb]\n            emb_l = to_np(self.get_embedding(self.X[idxs_l], Y[idxs_l])).astype(float)\n            emb_l /= np.linalg.norm(emb_l, axis=1, keepdims=True) + 1e-12\n            yl = Y[idxs_l]\n            proto = np.zeros((C, emb_c.shape[1]), dtype=float)\n            for c in range(C):\n                msk = yl == c\n                proto[c] = emb_l[msk].mean(0) if np.any(msk) else 0.0\n            proto /= np.linalg.norm(proto, axis=1, keepdims=True) + 1e-12\n            novelty = np.clip(1.0 - (emb_c * proto[pred_c]).sum(1), 0.0, None)\n        else:\n            novelty = np.ones(len(pool), dtype=float)\n        De = np.clip(2.0 - 2.0 * (emb_c @ emb_c.T), 0.0, None)\n        k = min(10, max(1, len(pool) - 1))\n        density = 1.0 / (np.sort(De + np.eye(len(pool)) * 1e9, axis=1)[:, :k].mean(1) + 1e-6)\n        density /= density.max() + 1e-12\n        frac = float(np.mean(self.idxs_lb))\n        early = max(0.0, 1.0 - 3.0 * frac)\n        base = seed[cand] * (0.20 + (0.55 - 0.25 * early) * density + (0.25 + 0.25 * early) * novelty)\n        grad = to_np(self.get_grad_embedding(self.X[pool], Y[pool])).astype(float)\n        grad /= np.linalg.norm(grad, axis=1, keepdims=True) + 1e-12\n        Dg = np.clip(2.0 - 2.0 * (grad @ grad.T), 0.0, None)\n        chosen, cls_sel = [], np.zeros(C, dtype=float)\n        min_d = np.full(len(pool), Dg.max() if np.any(Dg) else 1.0, dtype=float)\n        scale = np.median(Dg[Dg > 0]) + 1e-12 if np.any(Dg > 0) else 1.0\n        for _ in range(min(n, len(pool))):\n            util = base * (1.0 + min_d / scale) / np.sqrt(1.0 + cls_sel[pred_c])\n            util[chosen] = -1.0\n            j = int(np.argmax(util))\n            chosen.append(j)\n            cls_sel[pred_c[j]] += 1.0\n            min_d = np.minimum(min_d, Dg[j])\n        return pool[np.asarray(chosen, dtype=int)]",
5  "after_line": 0,
6  "start_line": 28,
7  "end_line": 93
8}
Tool ResultStep 7
OK: Replaced lines 28..93 with 70 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–97.

[Current file: badge/query_strategies/custom_sampling.py | editable: 28–97 | total: 97 lines]
    28  class CustomSampling(Strategy):
    29      """Curriculum prototype-aware gradient covering.
    30  
       ... (64 more lines) ...
    95              cls_sel[pred_c[j]] += 1.0
    96              min_d = np.minimum(min_d, Dg[j])
    97          return pool[np.asarray(chosen, dtype=int)]