Agent Conversation: qwen/qwen3.6-plus

Model: qwen/qwen3.6-plus
Total steps: 20
Messages: 58
UserStep 0

Task: ml-active-learning

Active Learning: Query Strategy Design

Research Question

Design a novel pool-based active learning query strategy that outperforms existing methods (uncertainty sampling, entropy sampling, BADGE, BAIT, BALD) across diverse tabular classification datasets.

Background

Active learning aims to minimize labeling cost by intelligently selecting which unlabeled samples to query for labels. In pool-based active learning, a query strategy selects batches of samples from an unlabeled pool to be labeled by an oracle, then the model is retrained. The goal is to achieve the highest possible accuracy with the fewest labeled samples.

Classic approaches include:

  • Uncertainty Sampling: Select samples where the model is least confident (lowest max predicted probability)
  • Entropy Sampling: Select samples with highest predictive entropy
  • Query By Committee: Select samples with maximal disagreement among an ensemble

Modern approaches incorporate diversity and information-theoretic principles:

  • BADGE (Ash et al., ICLR 2020): Uses gradient embeddings with k-means++ for diverse, uncertain batch selection
  • BAIT (Ash et al., NeurIPS 2021): Optimizes Fisher information to select maximally informative batches
  • BALD (Houlsby et al., 2011): Uses MC Dropout to estimate mutual information between predictions and parameters

Task

Modify the CustomSampling class in badge/query_strategies/custom_sampling.py to implement a novel query strategy. The strategy must implement the query(n) method that returns n indices from the unlabeled pool.

Interface

class CustomSampling(Strategy):
    def __init__(self, X, Y, idxs_lb, net, handler, args):
        super().__init__(X, Y, idxs_lb, net, handler, args)

    def query(self, n) -> np.ndarray:
        # Must return n indices into self.X of unlabeled samples to label
        ...

Available from the Strategy base class:

  • self.X: pool features (numpy array, shape [n_pool, n_features])
  • self.Y: pool labels (torch LongTensor, shape [n_pool])
  • self.idxs_lb: boolean mask of labeled samples
  • self.n_pool: total pool size
  • self.predict_prob(X, Y): softmax probabilities [len(X), n_classes]
  • self.predict_prob_dropout_split(X, Y, n_drop): MC dropout probs [n_drop, len(X), n_classes]
  • self.get_embedding(X, Y): penultimate-layer embeddings [len(X), emb_dim]
  • self.get_grad_embedding(X, Y): gradient embeddings [len(X), emb_dim * n_classes]
  • self.get_exp_grad_embedding(X, Y): expected Fisher embeddings [len(X), n_classes, emb_dim]

Evaluation

  • Datasets: 3 OpenML tabular classification datasets (letter recognition, spambase, splice)
  • Protocol: 20 rounds of batch active learning, evaluated after each round
  • Metrics:
    • accuracy: Test accuracy at the end of 20 AL rounds (fixed label budget)
    • auc: Area under the learning curve (accuracy vs. number of labeled samples), measuring sample efficiency
AssistantStep 1edit
Tool Input
1{
2  "op": "replace",
3  "filename": "badge/query_strategies/custom_sampling.py",
4  "content": "\"\"\"Custom active learning query strategy.\n\nThis module defines a CustomSampling strategy that inherits from the badge\nframework's Strategy base class. The agent must implement the query() method\nto select the most informative samples from the unlabeled pool.\n\nInterface contract:\n  - self.X: numpy array of all pool features, shape (n_pool, n_features)\n  - self.Y: torch LongTensor of all pool labels, shape (n_pool,)\n  - self.idxs_lb: boolean array, True for labeled samples\n  - self.n_pool: total number of pool samples\n  - self.clf: the trained neural network model\n  - self.predict_prob(X, Y): returns softmax probabilities, shape (len(X), n_classes)\n  - self.predict_prob_dropout_split(X, Y, n_drop): returns MC dropout probs, shape (n_drop, len(X), n_classes)\n  - self.get_embedding(X, Y): returns penultimate-layer embeddings, shape (len(X), emb_dim)\n  - self.get_grad_embedding(X, Y): returns gradient embeddings (for BADGE), shape (len(X), emb_dim * n_classes)\n  - self.get_exp_grad_embedding(X, Y): returns expected Fisher embeddings (for BAIT), shape (len(X), n_classes, emb_dim)\n  - query(n) must return an array of n indices into self.X (indices of the UNLABELED pool)\n\"\"\"\n\nimport numpy as np\nimport torch\nfrom scipy import stats\nfrom sklearn.metrics import pairwise_distances\nfrom query_strategies.strategy import Strategy\n\n\n# ================================================================\n# EDITABLE REGION — Implement your query strategy below (lines 28-55)\n# ================================================================\nclass CustomSampling(Strategy):\n    \"\"\"Density-Weighted Bayesian Gradient Diversity (DWBGD).\n    \n    A novel query strategy that combines:\n    1. Bayesian epistemic uncertainty via MC dropout (BALD-style mutual information)\n    2. Gradient embedding diversity via k-means++ (BADGE-style)\n    3. Density-weighted candidate filtering to avoid outlier over-selection\n    \n    The key innovation: instead of applying diversity selection on all unlabeled\n    samples or just the most uncertain ones, we compute local density in feature\n    space and use it to weight the candidate pool size adaptively. Dense regions\n    with high uncertainty are prioritized, while sparse outlier regions get\n    deprioritized — achieving a better uncertainty-representativeness tradeoff.\n    \"\"\"\n\n    def __init__(self, X, Y, idxs_lb, net, handler, args):\n        super(CustomSampling, self).__init__(X, Y, idxs_lb, net, handler, args)\n        self.n_drop = 10  # MC dropout iterations\n        self.candidate_frac = 0.5  # fraction of pool to consider for diversity selection\n        self.density_k = 5  # k for k-NN density estimation\n\n    def _compute_bald_scores(self, idxs_unlabeled):\n        \"\"\"Compute BALD mutual information scores for unlabeled samples.\"\"\"\n        probs = self.predict_prob_dropout_split(\n            self.X[idxs_unlabeled], self.Y.numpy()[idxs_unlabeled], self.n_drop\n        )\n        # Mean prediction across MC samples: p(y|x, D)\n        pb = probs.mean(0)\n        # Expected entropy: E_theta[H[y|x,theta]]\n        expected_entropy = (-probs * torch.log(probs + 1e-10)).sum(2).mean(0)\n        # Total entropy: H[y|x,D]\n        total_entropy = (-pb * torch.log(pb + 1e-10)).sum(1)\n        # BALD = expected_entropy - total_entropy (MI between y and theta)\n        bald_scores = expected_entropy - total_entropy\n        return bald_scores.cpu().numpy(), pb.cpu().numpy()\n\n    def _compute_local_density(self, idxs_unlabeled):\n        \"\"\"Compute local density using k-NN in feature space.\"\"\"\n        X_unlabeled = self.X[idxs_unlabeled]\n        n_samples = len(X_unlabeled)\n        \n        if n_samples <= self.density_k + 1:\n            return np.ones(n_samples)\n        \n        # Compute pairwise distances (use subset if too large for efficiency)\n        distances = pairwise_distances(X_unlabeled, metric='euclidean', n_jobs=-1)\n        np.fill_diagonal(distances, np.inf)\n        \n        # k-NN distances: average distance to k nearest neighbors\n        k = min(self.density_k, n_samples - 1)\n        knn_dists = np.partition(distances, k, axis=1)[:, :k]\n        avg_knn_dist = knn_dists.mean(axis=1)\n        \n        # Convert distance to density (inverse)\n        density = 1.0 / (avg_knn_dist + 1e-8)\n        \n        # Normalize to [0, 1]\n        density_min, density_max = density.min(), density.max()\n        if density_max > density_min:\n            density = (density - density_min) / (density_max - density_min)\n        else:\n            density = np.ones(n_samples)\n        \n        return density\n\n    def _badge_kmeans_plus_plus(self, embs, probs, n):\n        \"\"\"BADGE-style k-means++ selection in gradient embedding space.\"\"\"\n        m = len(probs)\n        emb_norms_square = np.sum(embs ** 2, axis=-1)\n        max_inds = np.argmax(probs, axis=-1)\n\n        prob_residuals = -1.0 * probs\n        prob_residuals[np.arange(m), max_inds] += 1.0\n        prob_norms_square = np.sum(prob_residuals ** 2, axis=-1)\n\n        # k-means++ initialization\n        chosen = set()\n        chosen_list = []\n        mu = None\n        D2 = None\n\n        def _distance(X1, X2, center):\n            Y1, Y2 = center\n            X1_vec, X1_norm_sq = X1\n            X2_vec, X2_norm_sq = X2\n            Y1_vec, Y1_norm_sq = Y1\n            Y2_vec, Y2_norm_sq = Y2\n            dist = (X1_norm_sq * X2_norm_sq + Y1_norm_sq * Y2_norm_sq\n                    - 2.0 * (X1_vec @ Y1_vec) * (X2_vec @ Y2_vec))\n            return np.sqrt(np.clip(dist, a_min=0, a_max=None))\n\n        for _ in range(n):\n            if len(chosen) == 0:\n                ind = np.argmax(emb_norms_square * prob_norms_square)\n                mu = [((prob_residuals[ind], prob_norms_square[ind]),\n                       (embs[ind], emb_norms_square[ind]))]\n                D2 = _distance(\n                    (prob_residuals, prob_norms_square),\n                    (embs, emb_norms_square),\n                    mu[0],\n                ).ravel().astype(float)\n                D2[ind] = 0\n                chosen.add(ind)\n                chosen_list.append(ind)\n            else:\n                newD = _distance(\n                    (prob_residuals, prob_norms_square),\n                    (embs, emb_norms_square),\n                    mu[-1],\n                ).ravel().astype(float)\n                D2 = np.minimum(D2, newD)\n                D2[list(chosen)] = 0\n                D2_sq = D2 ** 2\n                total = D2_sq.sum()\n                if total == 0:\n                    remaining = list(set(range(m)) - chosen)\n                    ind = np.random.choice(remaining)\n                else:\n                    Ddist = D2_sq / total\n                    customDist = stats.rv_discrete(\n                        name=\"custm\", values=(np.arange(len(Ddist)), Ddist)\n                    )\n                    ind = customDist.rvs(size=1)[0]\n                    while ind in chosen:\n                        ind = customDist.rvs(size=1)[0]\n                mu.append(((prob_residuals[ind], prob_norms_square[ind]),\n                           (embs[ind], emb_norms_square[ind])))\n                chosen.add(ind)\n                chosen_list.append(ind)\n\n        return chosen_list\n\n    def query(self, n):\n        \"\"\"Select n samples from the unlabeled pool to label next.\n        \n        Strategy:\n        1. Compute BALD mutual information scores for all unlabeled samples\n        2. Compute local density in feature space\n        3. Create a weighted score = bald_score * density^alpha\n        4. Select candidate pool (top fraction by weighted score)\n        5. Apply BADGE k-means++ on gradient embeddings within candidate pool\n        \"\"\"\n        idxs_unlabeled = np.arange(self.n_pool)[~self.idxs_lb]\n        m = len(idxs_unlabeled)\n        \n        if m <= n:\n            return idxs_unlabeled\n        \n        # Step 1: Compute BALD scores\n        bald_scores, mean_probs = self._compute_bald_scores(idxs_unlabeled)\n        \n        # Step 2: Compute local density\n        density = self._compute_local_density(idxs_unlabeled)\n        \n        # Step 3: Combine with density weighting\n        # Use adaptive alpha based on pool size - stronger density weighting for larger pools\n        alpha = 0.3  # density exponent - balances uncertainty vs representativeness\n        weighted_scores = bald_scores * (density ** alpha)\n        \n        # Step 4: Select candidate pool for diversity selection\n        # Use adaptive candidate size: balance between quality and diversity\n        candidate_size = max(min(int(m * self.candidate_frac), int(m * 0.7)), \n                            min(4 * n, int(m * 0.3)))\n        candidate_size = min(candidate_size, m)\n        \n        if candidate_size >= m:\n            candidate_local = np.arange(m)\n        else:\n            # Select top candidates by weighted score\n            candidate_local = np.argpartition(weighted_scores, -candidate_size)[-candidate_size:]\n            candidate_local = candidate_local[np.argsort(weighted_scores[candidate_local])[::-1]]\n        \n        # Re-index to local pool\n        candidate_local = np.sort(candidate_local)\n        \n        # Step 5: Get embeddings and probabilities for candidates\n        c_idx = idxs_unlabeled[candidate_local]\n        try:\n            embs = self.get_embedding(self.X[c_idx], self.Y.numpy()[c_idx])\n            if isinstance(embs, torch.Tensor):\n                embs = embs.numpy()\n            probs = self.predict_prob(self.X[c_idx], self.Y.numpy()[c_idx])\n            if isinstance(probs, torch.Tensor):\n                probs = probs.numpy()\n            \n            cand_n = min(n, len(candidate_local))\n            chosen_local = self._badge_kmeans_plus_plus(embs, probs, cand_n)\n            return c_idx[chosen_local]\n        except Exception:\n            # Fallback to top weighted scores if embedding computation fails\n            top_inds = np.argsort(weighted_scores)[-n:]\n            return idxs_unlabeled[top_inds]\n\n# =============================================\n... (truncated)",
5  "start_line": 28,
6  "end_line": 51
7}
Tool ResultStep 1
OK: Replaced lines 28..51 with 226 line(s) in badge/query_strategies/custom_sampling.py. Editable range: 28–256.

[Current file: badge/query_strategies/custom_sampling.py | editable: 28–256 | total: 256 lines]
    28  """Custom active learning query strategy.
    29  
    30  This module defines a CustomSampling strategy that inherits from the badge
       ... (223 more lines) ...
   254  # ================================================================
   255  # END EDITABLE REGION
   256  # ================================================================
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=N) to choose which test result to submit as final)

### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_110948/badge/run_al.py", line 230, in <module>
    main()
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_110948/badge/run_al.py", line 81, in main
    data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)


### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-
AssistantStep 3execute
Tool Input
1{
2  "command": "find /workspace -name \"*.pk\" -o -name \"*.pkl\" 2>/dev/null | head -30"
3}
Tool ResultStep 3
ERROR: Unknown tool 'execute'
AssistantStep 4edit
Tool Input
1{
2  "error": "Failed to parse tool arguments: Extra data: line 1 column 499 (char 498)"
3}
Tool ResultStep 4
ERROR: Failed to parse tool arguments: Extra data: line 1 column 499 (char 498)
AssistantStep 5test
Tool ResultStep 5
[Test #2] (1 test remaining; call submit(n=N) to choose which test result to submit as final)

### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_110948/badge/run_al.py", line 230, in <module>
    main()
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_110948/badge/run_al.py", line 81, in main
    data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)


### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-a
AssistantStep 6read
Tool Input
1{
2  "path": "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_110948/scripts/letter.sh"
3}
Tool ResultStep 6
ERROR: Unknown tool 'read'
AssistantStep 7read_file
Tool Input
1{
2  "path": "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_110948/badge/run_al.py"
3}
Tool ResultStep 7
ERROR: Unknown tool 'read_file'
AssistantStep 8read_file
Tool Input
1{
2  "path": "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_110948/badge/run_al.py"
3}
Tool ResultStep 8
ERROR: Unknown tool 'read_file'
AssistantStep 9submit
Tool Input
1{
2  "n": 1
3}
Tool ResultStep 9
[submit] Submitting result from test #1 as final.

### letter (scripts/letter.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_110948/badge/run_al.py", line 230, in <module>
    main()
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_110948/badge/run_al.py", line 81, in main
    data = pickle.load(open(data_path, "rb"))["data"]
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/badge/oml/data_6.pk'
ERROR conda.cli.main_run:execute(142): `conda run bash /srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_110948/.local_scripts/scripts/letter.sh` failed. (See above for error)


### spambase (scripts/spambase.sh)
[COMMAND FAILED exit=1]
Traceback (most recent call last):
  File "/srv/home/bohanlyu/MLS-Bench/vendor/workspace/ml-active-learning/qwen_qwen3.6-plus_20260414_11