Agent Conversation: deepseek-reasoner

Model: deepseek-reasoner

Total steps: 20

Messages: 68

UserStep 0

Task: ml-active-learning

Active Learning: Query Strategy Design

Research Question

Design a novel pool-based active learning query strategy that outperforms existing methods (uncertainty sampling, entropy sampling, BADGE, BAIT, BALD) across diverse tabular classification datasets.

Background

Active learning aims to minimize labeling cost by intelligently selecting which unlabeled samples to query for labels. In pool-based active learning, a query strategy selects batches of samples from an unlabeled pool to be labeled by an oracle, then the model is retrained. The goal is to achieve the highest possible accuracy with the fewest labeled samples.

Classic approaches include:

Uncertainty Sampling: Select samples where the model is least confident (lowest max predicted probability)
Entropy Sampling: Select samples with highest predictive entropy
Query By Committee: Select samples with maximal disagreement among an ensemble

Modern approaches incorporate diversity and information-theoretic principles:

BADGE (Ash et al., ICLR 2020): Uses gradient embeddings with k-means++ for diverse, uncertain batch selection
BAIT (Ash et al., NeurIPS 2021): Optimizes Fisher information to select maximally informative batches
BALD (Houlsby et al., 2011): Uses MC Dropout to estimate mutual information between predictions and parameters

Task

Modify the CustomSampling class in badge/query_strategies/custom_sampling.py to implement a novel query strategy. The strategy must implement the query(n) method that returns n indices from the unlabeled pool.

Interface

class CustomSampling(Strategy):
    def __init__(self, X, Y, idxs_lb, net, handler, args):
        super().__init__(X, Y, idxs_lb, net, handler, args)

    def query(self, n) -> np.ndarray:
        # Must return n indices into self.X of unlabeled samples to label
        ...

Available from the Strategy base class:

self.X: pool features (numpy array, shape [n_pool, n_features])
self.Y: pool labels (torch LongTensor, shape [n_pool])
self.idxs_lb: boolean mask of labeled samples
self.n_pool: total pool size
self.predict_prob(X, Y): softmax probabilities [len(X), n_classes]
self.predict_prob_dropout_split(X, Y, n_drop): MC dropout probs [n_drop, len(X), n_classes]
self.get_embedding(X, Y): penultimate-layer embeddings [len(X), emb_dim]
self.get_grad_embedding(X, Y): gradient embeddings [len(X), emb_dim * n_classes]
self.get_exp_grad_embedding(X, Y): expected Fisher embeddings [len(X), n_classes, emb_dim]

Evaluation

Datasets: 3 OpenML tabular classification datasets (letter recognition, spambase, splice)
Protocol: 20 rounds of batch active learning, evaluated after each round
Metrics:
- accuracy: Test accuracy at the end of 20 AL rounds (fixed label budget)
- auc: Area under the learning curve (accuracy vs. number of labeled samples), measuring sample efficiency

AssistantStep 1edit

Tool Input

1{
2  "op": "replace",
3  "filename": "badge/query_strategies/custom_sampling.py",
4  "start_line": 28,
5  "end_line": 54,
6  "content": "class CustomSampling(Strategy):\n    \"\"\"Gradient-BALD: Combines Bayesian Active Learning by Disagreement (BALD)\n    with gradient embeddings for diverse batch selection.\n    \n    Uses MC dropout to compute mutual information (BALD) for uncertainty,\n    and gradient embeddings for diversity via k-means++ weighted by BALD scores.\n    \"\"\"\n\n    def __init__(self, X, Y, idxs_lb, net, handler, args):\n        super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n        self.n_drop = 10  # number of MC dropout samples\n\n    def query(self, n):\n        import torch\n        from scipy import stats\n        import numpy as np\n        \n        idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n        if len(idxs_unlabeled) <= n:\n            return idxs_unlabeled\n        \n        # 1. Compute BALD scores (mutual information)\n        probs_dropout = self.predict_prob_dropout_split(\n            self.X[idxs_unlabeled], self.Y.numpy()[idxs_unlabeled], self.n_drop\n        )  # shape (n_drop, m, n_classes)\n        pb = probs_dropout.mean(0)  # mean over dropout samples\n        entropy1 = (-pb * torch.log(pb + 1e-10)).sum(1)  # H[y | x, D]\n        entropy2 = (-probs_dropout * torch.log(probs_dropout + 1e-10)).sum(2).mean(0)  # E_theta[H[y | x, theta]]\n        bald_scores = entropy2 - entropy1  # mutual information\n        bald_scores_np = bald_scores.numpy()\n        \n        # 2. Compute gradient embeddings\n        grad_embs = self.get_grad_embedding(\n            self.X[idxs_unlabeled], self.Y.numpy()[idxs_unlabeled]\n        )  # shape (m, emb_dim * n_classes)\n        grad_embs_np = grad_embs.numpy()\n        \n        # 3. Normalize BALD scores to [0,1] and gradient norms\n        bald_scores_norm = (bald_scores_np - bald_scores_np.min()) / (bald_scores_np.max() - bald_scores_np.min() + 1e-10)\n        grad_norms = np.linalg.norm(grad_embs_np, axis=1)\n        grad_norms_norm = (grad_norms - grad_norms.min()) / (grad_norms.max() - grad_norms.min() + 1e-10)\n        \n        # 4. Combined score for initialization weighting\n        combined_scores = bald_scores_norm * grad_norms_norm\n        \n        # 5. Perform k-means++ selection weighted by combined scores\n        m = len(idxs_unlabeled)\n        chosen = set()\n        chosen_list = []\n        \n        # Precompute squared norms for distance calculations\n        grad_norms_sq = np.sum(grad_embs_np ** 2, axis=1)\n        \n        def distance(i, j):\n            # Euclidean distance between gradient embeddings\n            return np.sqrt(grad_norms_sq[i] + grad_norms_sq[j] - 2 * np.dot(grad_embs_np[i], grad_embs_np[j]))\n        \n        # First centroid: sample with probability proportional to combined_scores\n        if len(chosen) == 0:\n            probs = combined_scores / combined_scores.sum()\n            custom_dist = stats.rv_discrete(values=(np.arange(m), probs))\n            ind = custom_dist.rvs(size=1)[0]\n            chosen.add(ind)\n            chosen_list.append(ind)\n        \n        # Initialize distances\n        D2 = np.full(m, np.inf)\n        for i in chosen:\n            for j in range(m):\n                if j not in chosen:\n                    d = distance(i, j)\n                    if d < D2[j]:\n                        D2[j] = d\n        \n        # Subsequent centroids: weighted by D2^2 * combined_scores\n        while len(chosen) < n:\n            if D2.sum() == 0:\n                # Fallback: random from remaining\n                remaining = list(set(range(m)) - chosen)\n                ind = np.random.choice(remaining)\n            else:\n                weights = (D2 ** 2) * combined_scores\n                weights[list(chosen)] = 0\n                probs = weights / weights.sum()\n                custom_dist = stats.rv_discrete(values=(np.arange(m), probs))\n                ind = custom_dist.rvs(size=1)[0]\n                while ind in chosen:\n                    ind = custom_dist.rvs(size=1)[0]\n            chosen.add(ind)\n            chosen_list.append(ind)\n            # Update distances\n            for j in range(m):\n                if j not in chosen:\n                    d = distance(ind, j)\n                    if d < D2[j]:\n                        D2[j] = d\n        \n        return idxs_unlabeled[chosen_list]"
7}

Tool ResultStep 1

OK: Replaced lines 28..54 with 98 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–125.

[Current file: badge/query_strategies/custom_sampling.py | editable: 28–125 | total: 125 lines]
    28  class CustomSampling(Strategy):
    29      """Gradient-BALD: Combines Bayesian Active Learning by Disagreement (BALD)
    30      with gradient embeddings for diverse batch selection.
       ... (92 more lines) ...
   123                          D2[j] = d
   124          
   125          return idxs_unlabeled[chosen_list]

AssistantStep 2test

Tool ResultStep 2

[Test #1] (2 tests remaining; call submit(n=N) to choose which test result to submit as final)

### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/badge/run_al.py", line 230, in <module>
    main()
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/badge/run_al.py", line 81, in main
    data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)


### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-

AssistantStep 3undo

Tool Input

1{
2  "n": 1
3}

Tool ResultStep 3

Undo complete:
Restored: badge/query_strategies/custom_sampling.py

AssistantStep 4edit

Tool Input

1{
2  "op": "replace",
3  "filename": "badge/query_strategies/custom_sampling.py",
4  "start_line": 28,
5  "end_line": 54,
6  "content": "class CustomSampling(Strategy):\n    \"\"\"BADGE-BALD hybrid: Combines BADGE's diverse gradient embeddings with \n    BALD's Bayesian uncertainty scores for improved batch selection.\n    \n    Uses MC dropout to compute mutual information (BALD) to weight the\n    k-means++ selection in the gradient-probability residual space.\n    \"\"\"\n\n    def __init__(self, X, Y, idxs_lb, net, handler, args):\n        super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n        self.n_drop = 10  # number of MC dropout samples\n\n    def query(self, n):\n        import torch\n        from scipy import stats\n        import numpy as np\n        \n        idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n        if len(idxs_unlabeled) <= n:\n            return idxs_unlabeled\n        \n        # 1. Compute BALD scores (mutual information)\n        probs_dropout = self.predict_prob_dropout_split(\n            self.X[idxs_unlabeled], self.Y.numpy()[idxs_unlabeled], self.n_drop\n        )  # shape (n_drop, m, n_classes)\n        pb = probs_dropout.mean(0)  # mean over dropout samples\n        entropy1 = (-pb * torch.log(pb + 1e-10)).sum(1)  # H[y | x, D]\n        entropy2 = (-probs_dropout * torch.log(probs_dropout + 1e-10)).sum(2).mean(0)  # E_theta[H[y | x, theta]]\n        bald_scores = entropy2 - entropy1  # mutual information\n        bald_scores_np = bald_scores.numpy()\n        \n        # 2. Get embeddings and probabilities (as in BADGE)\n        # Note: self.get_embedding with return_probs=True returns (embeddings, probabilities)\n        embs, probs = self.get_embedding(\n            self.X[idxs_unlabeled], self.Y.numpy()[idxs_unlabeled], return_probs=True\n        )\n        embs = embs.numpy()\n        probs = probs.numpy()\n        \n        m = len(idxs_unlabeled)\n        emb_norms_square = np.sum(embs ** 2, axis=-1)\n        max_inds = np.argmax(probs, axis=-1)\n        \n        prob_residuals = -1.0 * probs\n        prob_residuals[np.arange(m), max_inds] += 1.0\n        prob_norms_square = np.sum(prob_residuals ** 2, axis=-1)\n        \n        # 3. Normalize BALD scores to [0,1] and combine with gradient-probability product\n        bald_norm = (bald_scores_np - bald_scores_np.min()) / (bald_scores_np.max() - bald_scores_np.min() + 1e-10)\n        # Weight the importance score by BALD\n        importance = emb_norms_square * prob_norms_square * (1.0 + bald_norm)  # boost by bald\n        \n        # k-means++ initialization with weighted probabilities\n        chosen = set()\n        chosen_list = []\n        mu = None\n        D2 = None\n        \n        def _distance(X1, X2, center):\n            Y1, Y2 = center\n            X1_vec, X1_norm_sq = X1\n            X2_vec, X2_norm_sq = X2\n            Y1_vec, Y1_norm_sq = Y1\n            Y2_vec, Y2_norm_sq = Y2\n            dist = (X1_norm_sq * X2_norm_sq + Y1_norm_sq * Y2_norm_sq\n                    - 2.0 * (X1_vec @ Y1_vec) * (X2_vec @ Y2_vec))\n            return np.sqrt(np.clip(dist, a_min=0, a_max=None))\n        \n        for i in range(n):\n            if len(chosen) == 0:\n                # First centroid: sample with probability proportional to importance\n                probs = importance / importance.sum()\n                custom_dist = stats.rv_discrete(values=(np.arange(m), probs))\n                ind = custom_dist.rvs(size=1)[0]\n                mu = [((prob_residuals[ind], prob_norms_square[ind]),\n                        (embs[ind], emb_norms_square[ind]))]\n                D2 = _distance(\n                    (prob_residuals, prob_norms_square),\n                    (embs, emb_norms_square),\n                    mu[0],\n                ).ravel().astype(float)\n                D2[ind] = 0\n                chosen.add(ind)\n                chosen_list.append(ind)\n            else:\n                newD = _distance(\n                    (prob_residuals, prob_norms_square),\n                    (embs, emb_norms_square),\n                    mu[-1],\n                ).ravel().astype(float)\n                D2 = np.minimum(D2, newD)\n                D2[list(chosen)] = 0\n                D2_sq = D2 ** 2\n                # Weight D2 by BALD scores to prefer uncertain points that are also diverse\n                weighted_D2 = D2_sq * (1.0 + bald_norm)\n                total = weighted_D2.sum()\n                if total == 0:\n                    # Fallback: random from remaining unlabeled\n                    remaining = list(set(range(m)) - chosen)\n                    ind = np.random.choice(remaining)\n                else:\n                    Ddist = weighted_D2 / total\n                    customDist = stats.rv_discrete(\n                        name=\"custm\", values=(np.arange(len(Ddist)), Ddist)\n                    )\n                    ind = customDist.rvs(size=1)[0]\n                    while ind in chosen:\n                        ind = customDist.rvs(size=1)[0]\n                mu.append(((prob_residuals[ind], prob_norms_square[ind]),\n                           (embs[ind], emb_norms_square[ind])))\n                chosen.add(ind)\n                chosen_list.append(ind)\n        \n        return idxs_unlabeled[chosen_list]"
7}

Tool ResultStep 4

OK: Replaced lines 28..54 with 114 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–141.

[Current file: badge/query_strategies/custom_sampling.py | editable: 28–141 | total: 141 lines]
    28  class CustomSampling(Strategy):
    29      """BADGE-BALD hybrid: Combines BADGE's diverse gradient embeddings with 
    30      BALD's Bayesian uncertainty scores for improved batch selection.
       ... (108 more lines) ...
   139                  chosen_list.append(ind)
   140          
   141          return idxs_unlabeled[chosen_list]

AssistantStep 5test

Tool ResultStep 5

[Test #2] (1 test remaining; call submit(n=N) to choose which test result to submit as final)

### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/badge/run_al.py", line 230, in <module>
    main()
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/badge/run_al.py", line 81, in main
    data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)


### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-a

AssistantStep 6undo

Tool Input

1{
2  "n": 1
3}

Tool ResultStep 6

Undo complete:
Restored: badge/query_strategies/custom_sampling.py

AssistantStep 7edit

Tool Input

1{
2  "op": "replace",
3  "filename": "badge/query_strategies/custom_sampling.py",
4  "start_line": 28,
5  "end_line": 54,
6  "content": "class CustomSampling(Strategy):\n    \"\"\"Random sampling baseline — selects samples uniformly at random.\"\"\"\n\n    def __init__(self, X, Y, idxs_lb, net, handler, args):\n        super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n\n    def query(self, n):\n        idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n        return idxs_unlabeled[np.random.permutation(len(idxs_unlabeled))][:n]"
7}

Tool ResultStep 7

OK: Replaced lines 28..54 with 9 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–36.

[Current file: badge/query_strategies/custom_sampling.py | editable: 28–36 | total: 36 lines]
    28  class CustomSampling(Strategy):
    29      """Random sampling baseline — selects samples uniformly at random."""
    30  
       ... (3 more lines) ...
    34      def query(self, n):
    35          idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]
    36          return idxs_unlabeled[np.random.permutation(len(idxs_unlabeled))][:n]

AssistantStep 8test

Tool ResultStep 8

[Test #3] (0 tests remaining; call submit(n=N) to choose which test result to submit as final)

[NOTE] This was your last test. You MUST now call submit(n=X) to choose which test result to submit as your final answer.

### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/badge/run_al.py", line 230, in <module>
    main()
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/badge/run_al.py", line 81, in main
    data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)


### spambase (scripts/spamba

AssistantStep 9undo

Tool Input

1{
2  "n": 1
3}

Tool ResultStep 9

Undo complete:
Restored: badge/query_strategies/custom_sampling.py