Agent Conversation: google/gemini-3.1-pro-preview

Model: google/gemini-3.1-pro-preview
Total steps: 7
Messages: 30
UserStep 0

Task: security-poison-robust-learning

Poison-Robust Learning under Fixed Label Poisoning

Research Question

How can we design a stronger loss function or sample-weighting rule that improves robustness to poisoned training labels without changing the model, optimizer, or data pipeline?

Background

A small fraction of poisoned labels can disproportionately distort model decision boundaries. Robust learning methods typically modify the objective to downweight suspicious samples or reduce memorization of corrupted targets.

Task

Implement a better poison-robust objective in bench/poison/custom_robust_loss.py. The fixed harness injects label corruption into the training set, trains with your loss, and evaluates on a clean test set.

Your method should improve clean test accuracy under poisoning while keeping the training procedure modular and transferable across datasets.

Editable Interface

You must implement:

class RobustLoss:
    def compute_loss(self, logits, labels, epoch):
        ...
  • logits: current minibatch model outputs
  • labels: possibly poisoned labels
  • epoch: current training epoch
  • Return value: scalar loss tensor

The corruption process, model, optimizer, and training schedule are fixed.

Evaluation

Benchmarks:

  • fmnist-labelflip
  • cifar10-labelflip
  • cifar100-labelflip

Reported metrics:

  • test_acc
  • poison_fit
  • robust_score

Primary metric: robust_score (higher is better).

Baselines

  • cross_entropy: standard ERM on poisoned labels
  • generalized_ce: generalized cross-entropy for noisy labels
  • symmetric_ce: CE plus reverse-CE penalty
  • bootstrap: target interpolation with model predictions

pytorch-vision/bench/poison/custom_robust_loss.py [EDITABLE — entire file only]

     1: """Editable poison-robust loss for MLS-Bench."""
     2: 
     3: import torch
     4: import torch.nn.functional as F
     5: 
     6: # ============================================================
     7: # EDITABLE
     8: # ============================================================
     9: class RobustLoss:
    10:     """Default cross-entropy objective."""
    11: 
    12:     def __init__(self):
    13:         self.q = 0.7
    14: 
    15:     def compute_loss(self, logits, labels, epoch):
    16:         return F.cross_entropy(logits, labels)
    17: # ============================================================
    18: # END EDITABLE
    19: # ============================================================

cross_entropy baseline — editable region [READ-ONLY — reference implementation]

     1: """Editable poison-robust loss for MLS-Bench."""
     2: 
     3: import torch
     4: import torch.nn.functional as F
     5: 
     6: # ============================================================
     7: # EDITABLE
     8: class RobustLoss:
     9:     """Standard cross-entropy on poisoned labels."""
    10: 
    11:     def __init__(self):
    12:       
AssistantStep 1edit
Tool Input
1{
2  "end_line": 16,
3  "filename": "bench/poison/custom_robust_loss.py",
4  "start_line": 12,
5  "op": "replace",
6  "after_line": 1,
7  "content": "    def __init__(self):\n        self.k = 1.0 # Decay rate for high-loss samples\n\n    def compute_loss(self, logits, labels, epoch):\n        losses = F.cross_entropy(logits, labels, reduction='none')\n        \n        with torch.no_grad():\n            mu = losses.mean()\n            sigma = losses.std()\n            \n            # Z-score normalization of losses\n            z = (losses - mu) / (sigma + 1e-8)\n            \n            # Suppress samples with higher than average loss\n            w = torch.exp(-self.k * F.relu(z))\n            \n            # Normalize to preserve overall gradient magnitude\n            w = w / (w.mean() + 1e-8)\n            \n        return (w * losses).mean()\n"
8}
Tool ResultStep 1
ERROR: Package 'bench' is not in allowed packages
AssistantStep 2edit
Tool Input
1{
2  "after_line": 1,
3  "start_line": 12,
4  "op": "replace",
5  "content": "    def __init__(self):\n        self.k = 1.0 # Decay rate for high-loss samples\n\n    def compute_loss(self, logits, labels, epoch):\n        losses = F.cross_entropy(logits, labels, reduction='none')\n        \n        with torch.no_grad():\n            mu = losses.mean()\n            sigma = losses.std()\n            \n            # Z-score normalization of losses\n            z = (losses - mu) / (sigma + 1e-8)\n            \n            # Suppress samples with higher than average loss\n            w = torch.exp(-self.k * F.relu(z))\n            \n            # Normalize to preserve overall gradient magnitude\n            w = w / (w.mean() + 1e-8)\n            \n        return (w * losses).mean()",
6  "filename": "pytorch-vision/bench/poison/custom_robust_loss.py",
7  "end_line": 16
8}
Tool ResultStep 2
OK: Replaced lines 12..16 with 20 line(s) in pytorch-vision/bench/poison/custom_robust_loss.py. Editable range: entire file.

[Current file: pytorch-vision/bench/poison/custom_robust_loss.py | editable: 1–34 | total: 34 lines]
     1  """Editable poison-robust loss for MLS-Bench."""
     2  
     3  import torch
     4  import torch.nn.functional as F
     5  
     6  # ============================================================
     7  # EDITABLE
     8  # ============================================================
     9  class RobustLoss:
    10      """Default cross-entropy objective."""
    11  
    12      def __init__(self):
    13          self.k = 1.0 # Decay rate for high-loss samples
    14  
    15      def compute_loss(self, logits, labels, epoch):
    16          losses = F.cross_entropy(logits, labels, reduction='none')
    17          
    18          with torch.no_grad():
    19              mu = losses.mean()
    20              sigma = losses.std()
    21          
AssistantStep 3test
Tool ResultStep 3
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)


## Seed 42

### cifar10-labelflip (scripts/cifar10_labelflip.sh)
TRAIN_METRICS epoch=1 loss=2.2368
TRAIN_METRICS epoch=2 loss=2.1888
TRAIN_METRICS epoch=3 loss=1.9441
TRAIN_METRICS epoch=4 loss=1.8644
TRAIN_METRICS epoch=5 loss=1.7702
cifar10-labelflip: test_acc=0.2907, poison_fit=0.1092, robust_score=0.5907

### fmnist-labelflip (scripts/fmnist_labelflip.sh)
TRAIN_METRICS epoch=1 loss=1.9800
TRAIN_METRICS epoch=2 loss=1.2586
TRAIN_METRICS epoch=3 loss=1.0580
TRAIN_METRICS epoch=4 loss=0.9498
TRAIN_METRICS epoch=5 loss=0.8899
fmnist-labelflip: test_acc=0.6880, poison_fit=0.0310, robust_score=0.8285

### cifar100-labelflip (scripts/cifar100_labelflip.sh)
TRAIN_METRICS epoch=1 loss=4.5010
TRAIN_METRICS epoch=2 loss=4.3066
TRAIN_METRICS epoch=3 loss=4.1742
TRAIN_METRICS epoch=4 loss=4.0694
TRAIN_METRICS epoch=5 loss=4.0010
cifar100-labelflip: test_acc=0.0440, poison_fit=0.0219, robust_score=0.5110


## Seed 123

### c
AssistantStep 4edit
Tool Input
1{
2  "end_line": 31,
3  "filename": "pytorch-vision/bench/poison/custom_robust_loss.py",
4  "start_line": 12,
5  "op": "replace",
6  "content": "    def __init__(self):\n        self.alpha = 1.0\n        self.beta = 1.0\n\n    def compute_loss(self, logits, labels, epoch):\n        probs = F.softmax(logits, dim=1).clamp_min(1e-8)\n        log_probs = torch.log(probs)\n        \n        # NCE (Normalized Cross Entropy)\n        ce = F.cross_entropy(logits, labels, reduction='none')\n        denom = -log_probs.sum(dim=1)\n        nce = (ce / (denom + 1e-8)).mean()\n        \n        # MAE / RCE (Mean Absolute Error)\n        p_y = probs.gather(1, labels[:, None]).squeeze()\n        mae = (1.0 - p_y).mean()\n        \n        return self.alpha * nce + self.beta * mae",
7  "after_line": 1
8}
Tool ResultStep 4
OK: Replaced lines 12..31 with 18 line(s) in pytorch-vision/bench/poison/custom_robust_loss.py. Editable range: entire file.

[Current file: pytorch-vision/bench/poison/custom_robust_loss.py | editable: 1–32 | total: 32 lines]
     1  """Editable poison-robust loss for MLS-Bench."""
     2  
     3  import torch
     4  import torch.nn.functional as F
     5  
     6  # ============================================================
     7  # EDITABLE
     8  # ============================================================
     9  class RobustLoss:
    10      """Default cross-entropy objective."""
    11  
    12      def __init__(self):
    13          self.alpha = 1.0
    14          self.beta = 1.0
    15  
    16      def compute_loss(self, logits, labels, epoch):
    17          probs = F.softmax(logits, dim=1).clamp_min(1e-8)
    18          log_probs = torch.log(probs)
    19          
    20          # NCE (Normalized Cross Entropy)
    21          ce = F.cross_entropy(logits, labe
AssistantStep 5test
Tool ResultStep 5
[Test #2] (1 test remaining; call submit(n=2) to submit this result)

### cifar10-labelflip (scripts/cifar10_labelflip.sh)
TRAIN_METRICS epoch=1 loss=0.9608
TRAIN_METRICS epoch=2 loss=0.9146
TRAIN_METRICS epoch=3 loss=0.8874
TRAIN_METRICS epoch=4 loss=0.8674
TRAIN_METRICS epoch=5 loss=0.8576
cifar10-labelflip: test_acc=0.2653, poison_fit=0.0942, robust_score=0.5856

### fmnist-labelflip (scripts/fmnist_labelflip.sh)
TRAIN_METRICS epoch=1 loss=0.8808
TRAIN_METRICS epoch=2 loss=0.6419
TRAIN_METRICS epoch=3 loss=0.5780
TRAIN_METRICS epoch=4 loss=0.4973
TRAIN_METRICS epoch=5 loss=0.4674
fmnist-labelflip: test_acc=0.7060, poison_fit=0.0110, robust_score=0.8475

### cifar100-labelflip (scripts/cifar100_labelflip.sh)
TRAIN_METRICS epoch=1 loss=1.0000
TRAIN_METRICS epoch=2 loss=0.9980
TRAIN_METRICS epoch=3 loss=0.9976
TRAIN_METRICS epoch=4 loss=0.9946
TRAIN_METRICS epoch=5 loss=0.9943
cifar100-labelflip: test_acc=0.0207, poison_fit=0.0105, robust_score=0.5051

[Leaderboard] Results saved: {'te
AssistantStep 1edit
Tool Input
1{
2  "start_line": 10,
3  "end_line": 16,
4  "filename": "bench/poison/custom_robust_loss.py",
5  "op": "replace",
6  "content": "    \"\"\"Generalized Jensen-Shannon Divergence Loss.\"\"\"\n\n    def __init__(self):\n        self.pi = 0.5\n\n    def compute_loss(self, logits, labels, epoch):\n        p = torch.softmax(logits, dim=1)\n        y = F.one_hot(labels, num_classes=logits.shape[1]).float()\n        \n        m = self.pi * y + (1 - self.pi) * p\n        m = m.clamp_min(1e-8)\n        p_safe = p.clamp_min(1e-8)\n        \n        # KL(Y || M) = - \\sum Y \\log(M) + \\sum Y \\log Y = - \\log(M_y)\n        kl_y_m = -torch.log(m.gather(1, labels[:, None])).squeeze(1)\n        \n        # KL(P || M) = \\sum P \\log(P / M)\n        kl_p_m = (p * (torch.log(p_safe) - torch.log(m))).sum(dim=1)\n        \n        loss = self.pi * kl_y_m + (1.0 - self.pi) * kl_p_m\n        return loss.mean()"
7}
Tool ResultStep 1
ERROR: Package 'bench' is not in allowed packages
AssistantStep 2edit
Tool Input
1{
2  "filename": "pytorch-vision/bench/poison/custom_robust_loss.py",
3  "content": "    \"\"\"Generalized Jensen-Shannon Divergence Loss.\"\"\"\n\n    def __init__(self):\n        self.pi = 0.5\n\n    def compute_loss(self, logits, labels, epoch):\n        p = torch.softmax(logits, dim=1)\n        y = F.one_hot(labels, num_classes=logits.shape[1]).float()\n        \n        m = self.pi * y + (1.0 - self.pi) * p\n        m = m.clamp_min(1e-8)\n        p_safe = p.clamp_min(1e-8)\n        \n        # KL(Y || M) = - \\sum Y \\log(M)\n        kl_y_m = -torch.log(m.gather(1, labels[:, None])).squeeze(1)\n        \n        # KL(P || M) = \\sum P \\log(P / M)\n        kl_p_m = (p * (torch.log(p_safe) - torch.log(m))).sum(dim=1)\n        \n        loss = self.pi * kl_y_m + (1.0 - self.pi) * kl_p_m\n        return loss.mean()",
4  "start_line": 10,
5  "end_line": 16,
6  "op": "replace"
7}
Tool ResultStep 2
OK: Replaced lines 10..16 with 21 line(s) in pytorch-vision/bench/poison/custom_robust_loss.py. Editable range: entire file.

[Current file: pytorch-vision/bench/poison/custom_robust_loss.py | editable: 1–33 | total: 33 lines]
     1  """Editable poison-robust loss for MLS-Bench."""
     2  
     3  import torch
     4  import torch.nn.functional as F
     5  
     6  # ============================================================
     7  # EDITABLE
     8  # ============================================================
     9  class RobustLoss:
    10      """Generalized Jensen-Shannon Divergence Loss."""
    11  
    12      def __init__(self):
    13          self.pi = 0.5
    14  
    15      def compute_loss(self, logits, labels, epoch):
    16          p = torch.softmax(logits, dim=1)
    17          y = F.one_hot(labels, num_classes=logits.shape[1]).float()
    18          
    19          m = self.pi * y + (1.0 - self.pi) * p
    20          m = m.clamp_min(1e-8)
    21          
AssistantStep 3test
AssistantStep 1edit
Tool Input
1{
2  "end_line": 18,
3  "op": "replace",
4  "start_line": 9,
5  "content": "class RobustLoss:\n    \"\"\"Dynamic Bootstrapped Generalized Cross-Entropy (DBGCE).\n    \n    Combines the robust gradient-bounding of Generalized Cross-Entropy (GCE)\n    with dynamic self-distillation (soft bootstrapping) to prevent memorization\n    of noisy labels and correct targets dynamically over the training schedule.\n    \"\"\"\n\n    def __init__(self):\n        self.q = 0.7\n\n    def compute_loss(self, logits, labels, epoch):\n        probs = torch.softmax(logits, dim=1)\n        \n        # Hard targeted labels\n        y_hard = F.one_hot(labels, num_classes=logits.shape[1]).float()\n        # Soft targeted pseudo-labels from the model's current predictions\n        y_soft = probs.detach()\n        \n        # Dynamic mixing coefficient (beta):\n        # - Epochs 0-15 (Warmup): beta=1.0. The model learns clean, easy samples \n        #   using GCE to bound the influence of noisy labels.\n        # - Epochs 15-100: beta decays linearly from 1.0 to 0.4. The model increasingly \n        #   relies on its own clean representations (self-distillation) to correct \n        #   the poisoned targets.\n        warmup = 15\n        if epoch < warmup:\n            beta = 1.0\n        else:\n            beta = max(0.4, 1.0 - 0.6 * (epoch - warmup) / (100.0 - warmup))\n            \n        target = beta * y_hard + (1.0 - beta) * y_soft\n        \n        # Element-wise Generalized Cross Entropy for soft targets\n        p_q = probs.clamp(min=1e-7).pow(self.q)\n        loss = target * (1.0 - p_q) / self.q\n        \n        return loss.sum(dim=1).mean()",
6  "filename": "bench/poison/custom_robust_loss.py"
7}
Tool ResultStep 1
ERROR: Package 'bench' is not in allowed packages
AssistantStep 2edit
Tool Input
1{
2  "start_line": 9,
3  "op": "replace",
4  "content": "class RobustLoss:\n    \"\"\"Dynamic Bootstrapped Generalized Cross-Entropy (DBGCE).\n    \n    Combines the robust gradient-bounding of Generalized Cross-Entropy (GCE)\n    with dynamic self-distillation (soft bootstrapping) to prevent memorization\n    of noisy labels and correct targets dynamically over the training schedule.\n    \"\"\"\n\n    def __init__(self):\n        self.q = 0.7\n\n    def compute_loss(self, logits, labels, epoch):\n        probs = torch.softmax(logits, dim=1)\n        \n        # Hard targeted labels\n        y_hard = F.one_hot(labels, num_classes=logits.shape[1]).float()\n        # Soft targeted pseudo-labels from the model's current predictions\n        y_soft = probs.detach()\n        \n        # Dynamic mixing coefficient (beta):\n        # - Epochs 0-15 (Warmup): beta=1.0. The model learns clean, easy samples \n        #   using GCE to bound the influence of noisy labels.\n        # - Epochs 15-100: beta decays linearly from 1.0 to 0.4. The model increasingly \n        #   relies on its own clean representations (self-distillation) to correct \n        #   the poisoned targets.\n        warmup = 15\n        if epoch < warmup:\n            beta = 1.0\n        else:\n            beta = max(0.4, 1.0 - 0.6 * (epoch - warmup) / (100.0 - warmup))\n            \n        target = beta * y_hard + (1.0 - beta) * y_soft\n        \n        # Element-wise Generalized Cross Entropy for soft targets\n        p_q = probs.clamp(min=1e-7).pow(self.q)\n        loss = target * (1.0 - p_q) / self.q\n        \n        return loss.sum(dim=1).mean()",
5  "filename": "pytorch-vision/bench/poison/custom_robust_loss.py",
6  "end_line": 18
7}
Tool ResultStep 2
OK: Replaced lines 9..18 with 38 line(s) in pytorch-vision/bench/poison/custom_robust_loss.py. Editable range: entire file.

[Current file: pytorch-vision/bench/poison/custom_robust_loss.py | editable: 1–47 | total: 47 lines]
     1  """Editable poison-robust loss for MLS-Bench."""
     2  
     3  import torch
     4  import torch.nn.functional as F
     5  
     6  # ============================================================
     7  # EDITABLE
     8  # ============================================================
     9  class RobustLoss:
    10      """Dynamic Bootstrapped Generalized Cross-Entropy (DBGCE).
    11      
    12      Combines the robust gradient-bounding of Generalized Cross-Entropy (GCE)
    13      with dynamic self-distillation (soft bootstrapping) to prevent memorization
    14      of noisy labels and correct targets dynamically over the training schedule.
    15      """
    16  
    17      def __init__(self):
    18          self.q = 0.7
    19  
    20