Agent Conversation: deepseek-reasoner
Task: ml-active-learning
Active Learning: Query Strategy Design
Research Question
Design a novel pool-based active learning query strategy that outperforms existing methods (uncertainty sampling, entropy sampling, BADGE, BAIT, BALD) across diverse tabular classification datasets.
Background
Active learning aims to minimize labeling cost by intelligently selecting which unlabeled samples to query for labels. In pool-based active learning, a query strategy selects batches of samples from an unlabeled pool to be labeled by an oracle, then the model is retrained. The goal is to achieve the highest possible accuracy with the fewest labeled samples.
Classic approaches include:
- Uncertainty Sampling: Select samples where the model is least confident (lowest max predicted probability)
- Entropy Sampling: Select samples with highest predictive entropy
- Query By Committee: Select samples with maximal disagreement among an ensemble
Modern approaches incorporate diversity and information-theoretic principles:
- BADGE (Ash et al., ICLR 2020): Uses gradient embeddings with k-means++ for diverse, uncertain batch selection
- BAIT (Ash et al., NeurIPS 2021): Optimizes Fisher information to select maximally informative batches
- BALD (Houlsby et al., 2011): Uses MC Dropout to estimate mutual information between predictions and parameters
Task
Modify the CustomSampling class in badge/query_strategies/custom_sampling.py to implement a novel query strategy. The strategy must implement the query(n) method that returns n indices from the unlabeled pool.
Interface
class CustomSampling(Strategy):
def __init__(self, X, Y, idxs_lb, net, handler, args):
super().__init__(X, Y, idxs_lb, net, handler, args)
def query(self, n) -> np.ndarray:
# Must return n indices into self.X of unlabeled samples to label
...
Available from the Strategy base class:
self.X: pool features (numpy array, shape[n_pool, n_features])self.Y: pool labels (torch LongTensor, shape[n_pool])self.idxs_lb: boolean mask of labeled samplesself.n_pool: total pool sizeself.predict_prob(X, Y): softmax probabilities[len(X), n_classes]self.predict_prob_dropout_split(X, Y, n_drop): MC dropout probs[n_drop, len(X), n_classes]self.get_embedding(X, Y): penultimate-layer embeddings[len(X), emb_dim]self.get_grad_embedding(X, Y): gradient embeddings[len(X), emb_dim * n_classes]self.get_exp_grad_embedding(X, Y): expected Fisher embeddings[len(X), n_classes, emb_dim]
Evaluation
- Datasets: 3 OpenML tabular classification datasets (letter recognition, spambase, splice)
- Protocol: 20 rounds of batch active learning, evaluated after each round
- Metrics:
accuracy: Test accuracy at the end of 20 AL rounds (fixed label budget)auc: Area under the learning curve (accuracy vs. number of labeled samples), measuring sample efficiency
Tool Input
1{
2 "op": "replace",
3 "filename": "badge/query_strategies/custom_sampling.py",
4 "start_line": 28,
5 "end_line": 54,
6 "content": "class CustomSampling(Strategy):\n \"\"\"Gradient-BALD: Combines Bayesian Active Learning by Disagreement (BALD)\n with gradient embeddings for diverse batch selection.\n \n Uses MC dropout to compute mutual information (BALD) for uncertainty,\n and gradient embeddings for diversity via k-means++ weighted by BALD scores.\n \"\"\"\n\n def __init__(self, X, Y, idxs_lb, net, handler, args):\n super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n self.n_drop = 10 # number of MC dropout samples\n\n def query(self, n):\n import torch\n from scipy import stats\n import numpy as np\n \n idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n if len(idxs_unlabeled) <= n:\n return idxs_unlabeled\n \n # 1. Compute BALD scores (mutual information)\n probs_dropout = self.predict_prob_dropout_split(\n self.X[idxs_unlabeled], self.Y.numpy()[idxs_unlabeled], self.n_drop\n ) # shape (n_drop, m, n_classes)\n pb = probs_dropout.mean(0) # mean over dropout samples\n entropy1 = (-pb * torch.log(pb + 1e-10)).sum(1) # H[y | x, D]\n entropy2 = (-probs_dropout * torch.log(probs_dropout + 1e-10)).sum(2).mean(0) # E_theta[H[y | x, theta]]\n bald_scores = entropy2 - entropy1 # mutual information\n bald_scores_np = bald_scores.numpy()\n \n # 2. Compute gradient embeddings\n grad_embs = self.get_grad_embedding(\n self.X[idxs_unlabeled], self.Y.numpy()[idxs_unlabeled]\n ) # shape (m, emb_dim * n_classes)\n grad_embs_np = grad_embs.numpy()\n \n # 3. Normalize BALD scores to [0,1] and gradient norms\n bald_scores_norm = (bald_scores_np - bald_scores_np.min()) / (bald_scores_np.max() - bald_scores_np.min() + 1e-10)\n grad_norms = np.linalg.norm(grad_embs_np, axis=1)\n grad_norms_norm = (grad_norms - grad_norms.min()) / (grad_norms.max() - grad_norms.min() + 1e-10)\n \n # 4. Combined score for initialization weighting\n combined_scores = bald_scores_norm * grad_norms_norm\n \n # 5. Perform k-means++ selection weighted by combined scores\n m = len(idxs_unlabeled)\n chosen = set()\n chosen_list = []\n \n # Precompute squared norms for distance calculations\n grad_norms_sq = np.sum(grad_embs_np ** 2, axis=1)\n \n def distance(i, j):\n # Euclidean distance between gradient embeddings\n return np.sqrt(grad_norms_sq[i] + grad_norms_sq[j] - 2 * np.dot(grad_embs_np[i], grad_embs_np[j]))\n \n # First centroid: sample with probability proportional to combined_scores\n if len(chosen) == 0:\n probs = combined_scores / combined_scores.sum()\n custom_dist = stats.rv_discrete(values=(np.arange(m), probs))\n ind = custom_dist.rvs(size=1)[0]\n chosen.add(ind)\n chosen_list.append(ind)\n \n # Initialize distances\n D2 = np.full(m, np.inf)\n for i in chosen:\n for j in range(m):\n if j not in chosen:\n d = distance(i, j)\n if d < D2[j]:\n D2[j] = d\n \n # Subsequent centroids: weighted by D2^2 * combined_scores\n while len(chosen) < n:\n if D2.sum() == 0:\n # Fallback: random from remaining\n remaining = list(set(range(m)) - chosen)\n ind = np.random.choice(remaining)\n else:\n weights = (D2 ** 2) * combined_scores\n weights[list(chosen)] = 0\n probs = weights / weights.sum()\n custom_dist = stats.rv_discrete(values=(np.arange(m), probs))\n ind = custom_dist.rvs(size=1)[0]\n while ind in chosen:\n ind = custom_dist.rvs(size=1)[0]\n chosen.add(ind)\n chosen_list.append(ind)\n # Update distances\n for j in range(m):\n if j not in chosen:\n d = distance(ind, j)\n if d < D2[j]:\n D2[j] = d\n \n return idxs_unlabeled[chosen_list]"
7}OK: Replaced lines 28..54 with 98 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–125.
[Current file: badge/query_strategies/custom_sampling.py | editable: 28–125 | total: 125 lines]
28 class CustomSampling(Strategy):
29 """Gradient-BALD: Combines Bayesian Active Learning by Disagreement (BALD)
30 with gradient embeddings for diverse batch selection.
... (92 more lines) ...
123 D2[j] = d
124
125 return idxs_unlabeled[chosen_list][Test #1] (2 tests remaining; call submit(n=N) to choose which test result to submit as final)
### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/badge/run_al.py", line 230, in <module>
main()
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/badge/run_al.py", line 81, in main
data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)
### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-
Tool Input
1{
2 "n": 1
3}Undo complete: Restored: badge/query_strategies/custom_sampling.py
Tool Input
1{
2 "op": "replace",
3 "filename": "badge/query_strategies/custom_sampling.py",
4 "start_line": 28,
5 "end_line": 54,
6 "content": "class CustomSampling(Strategy):\n \"\"\"BADGE-BALD hybrid: Combines BADGE's diverse gradient embeddings with \n BALD's Bayesian uncertainty scores for improved batch selection.\n \n Uses MC dropout to compute mutual information (BALD) to weight the\n k-means++ selection in the gradient-probability residual space.\n \"\"\"\n\n def __init__(self, X, Y, idxs_lb, net, handler, args):\n super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n self.n_drop = 10 # number of MC dropout samples\n\n def query(self, n):\n import torch\n from scipy import stats\n import numpy as np\n \n idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n if len(idxs_unlabeled) <= n:\n return idxs_unlabeled\n \n # 1. Compute BALD scores (mutual information)\n probs_dropout = self.predict_prob_dropout_split(\n self.X[idxs_unlabeled], self.Y.numpy()[idxs_unlabeled], self.n_drop\n ) # shape (n_drop, m, n_classes)\n pb = probs_dropout.mean(0) # mean over dropout samples\n entropy1 = (-pb * torch.log(pb + 1e-10)).sum(1) # H[y | x, D]\n entropy2 = (-probs_dropout * torch.log(probs_dropout + 1e-10)).sum(2).mean(0) # E_theta[H[y | x, theta]]\n bald_scores = entropy2 - entropy1 # mutual information\n bald_scores_np = bald_scores.numpy()\n \n # 2. Get embeddings and probabilities (as in BADGE)\n # Note: self.get_embedding with return_probs=True returns (embeddings, probabilities)\n embs, probs = self.get_embedding(\n self.X[idxs_unlabeled], self.Y.numpy()[idxs_unlabeled], return_probs=True\n )\n embs = embs.numpy()\n probs = probs.numpy()\n \n m = len(idxs_unlabeled)\n emb_norms_square = np.sum(embs ** 2, axis=-1)\n max_inds = np.argmax(probs, axis=-1)\n \n prob_residuals = -1.0 * probs\n prob_residuals[np.arange(m), max_inds] += 1.0\n prob_norms_square = np.sum(prob_residuals ** 2, axis=-1)\n \n # 3. Normalize BALD scores to [0,1] and combine with gradient-probability product\n bald_norm = (bald_scores_np - bald_scores_np.min()) / (bald_scores_np.max() - bald_scores_np.min() + 1e-10)\n # Weight the importance score by BALD\n importance = emb_norms_square * prob_norms_square * (1.0 + bald_norm) # boost by bald\n \n # k-means++ initialization with weighted probabilities\n chosen = set()\n chosen_list = []\n mu = None\n D2 = None\n \n def _distance(X1, X2, center):\n Y1, Y2 = center\n X1_vec, X1_norm_sq = X1\n X2_vec, X2_norm_sq = X2\n Y1_vec, Y1_norm_sq = Y1\n Y2_vec, Y2_norm_sq = Y2\n dist = (X1_norm_sq * X2_norm_sq + Y1_norm_sq * Y2_norm_sq\n - 2.0 * (X1_vec @ Y1_vec) * (X2_vec @ Y2_vec))\n return np.sqrt(np.clip(dist, a_min=0, a_max=None))\n \n for i in range(n):\n if len(chosen) == 0:\n # First centroid: sample with probability proportional to importance\n probs = importance / importance.sum()\n custom_dist = stats.rv_discrete(values=(np.arange(m), probs))\n ind = custom_dist.rvs(size=1)[0]\n mu = [((prob_residuals[ind], prob_norms_square[ind]),\n (embs[ind], emb_norms_square[ind]))]\n D2 = _distance(\n (prob_residuals, prob_norms_square),\n (embs, emb_norms_square),\n mu[0],\n ).ravel().astype(float)\n D2[ind] = 0\n chosen.add(ind)\n chosen_list.append(ind)\n else:\n newD = _distance(\n (prob_residuals, prob_norms_square),\n (embs, emb_norms_square),\n mu[-1],\n ).ravel().astype(float)\n D2 = np.minimum(D2, newD)\n D2[list(chosen)] = 0\n D2_sq = D2 ** 2\n # Weight D2 by BALD scores to prefer uncertain points that are also diverse\n weighted_D2 = D2_sq * (1.0 + bald_norm)\n total = weighted_D2.sum()\n if total == 0:\n # Fallback: random from remaining unlabeled\n remaining = list(set(range(m)) - chosen)\n ind = np.random.choice(remaining)\n else:\n Ddist = weighted_D2 / total\n customDist = stats.rv_discrete(\n name=\"custm\", values=(np.arange(len(Ddist)), Ddist)\n )\n ind = customDist.rvs(size=1)[0]\n while ind in chosen:\n ind = customDist.rvs(size=1)[0]\n mu.append(((prob_residuals[ind], prob_norms_square[ind]),\n (embs[ind], emb_norms_square[ind])))\n chosen.add(ind)\n chosen_list.append(ind)\n \n return idxs_unlabeled[chosen_list]"
7}OK: Replaced lines 28..54 with 114 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–141.
[Current file: badge/query_strategies/custom_sampling.py | editable: 28–141 | total: 141 lines]
28 class CustomSampling(Strategy):
29 """BADGE-BALD hybrid: Combines BADGE's diverse gradient embeddings with
30 BALD's Bayesian uncertainty scores for improved batch selection.
... (108 more lines) ...
139 chosen_list.append(ind)
140
141 return idxs_unlabeled[chosen_list][Test #2] (1 test remaining; call submit(n=N) to choose which test result to submit as final)
### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/badge/run_al.py", line 230, in <module>
main()
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/badge/run_al.py", line 81, in main
data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)
### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-a
Tool Input
1{
2 "n": 1
3}Undo complete: Restored: badge/query_strategies/custom_sampling.py
Tool Input
1{
2 "op": "replace",
3 "filename": "badge/query_strategies/custom_sampling.py",
4 "start_line": 28,
5 "end_line": 54,
6 "content": "class CustomSampling(Strategy):\n \"\"\"Random sampling baseline — selects samples uniformly at random.\"\"\"\n\n def __init__(self, X, Y, idxs_lb, net, handler, args):\n super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n\n def query(self, n):\n idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n return idxs_unlabeled[np.random.permutation(len(idxs_unlabeled))][:n]"
7}OK: Replaced lines 28..54 with 9 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–36.
[Current file: badge/query_strategies/custom_sampling.py | editable: 28–36 | total: 36 lines]
28 class CustomSampling(Strategy):
29 """Random sampling baseline — selects samples uniformly at random."""
30
... (3 more lines) ...
34 def query(self, n):
35 idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]
36 return idxs_unlabeled[np.random.permutation(len(idxs_unlabeled))][:n][Test #3] (0 tests remaining; call submit(n=N) to choose which test result to submit as final)
[NOTE] This was your last test. You MUST now call submit(n=X) to choose which test result to submit as your final answer.
### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/badge/run_al.py", line 230, in <module>
main()
File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/badge/run_al.py", line 81, in main
data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/deepseek-reasoner_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)
### spambase (scripts/spamba
Tool Input
1{
2 "n": 1
3}Undo complete: Restored: badge/query_strategies/custom_sampling.py