Agent Conversation: google/gemini-3.1-pro-preview
Task: security-poison-robust-learning
Poison-Robust Learning under Fixed Label Poisoning
Research Question
How can we design a stronger loss function or sample-weighting rule that improves robustness to poisoned training labels without changing the model, optimizer, or data pipeline?
Background
A small fraction of poisoned labels can disproportionately distort model decision boundaries. Robust learning methods typically modify the objective to downweight suspicious samples or reduce memorization of corrupted targets.
Task
Implement a better poison-robust objective in bench/poison/custom_robust_loss.py. The fixed harness injects label corruption into the training set, trains with your loss, and evaluates on a clean test set.
Your method should improve clean test accuracy under poisoning while keeping the training procedure modular and transferable across datasets.
Editable Interface
You must implement:
class RobustLoss:
def compute_loss(self, logits, labels, epoch):
...
logits: current minibatch model outputslabels: possibly poisoned labelsepoch: current training epoch- Return value: scalar loss tensor
The corruption process, model, optimizer, and training schedule are fixed.
Evaluation
Benchmarks:
fmnist-labelflipcifar10-labelflipcifar100-labelflip
Reported metrics:
test_accpoison_fitrobust_score
Primary metric: robust_score (higher is better).
Baselines
cross_entropy: standard ERM on poisoned labelsgeneralized_ce: generalized cross-entropy for noisy labelssymmetric_ce: CE plus reverse-CE penaltybootstrap: target interpolation with model predictions
pytorch-vision/bench/poison/custom_robust_loss.py [EDITABLE — entire file only]
1: """Editable poison-robust loss for MLS-Bench."""
2:
3: import torch
4: import torch.nn.functional as F
5:
6: # ============================================================
7: # EDITABLE
8: # ============================================================
9: class RobustLoss:
10: """Default cross-entropy objective."""
11:
12: def __init__(self):
13: self.q = 0.7
14:
15: def compute_loss(self, logits, labels, epoch):
16: return F.cross_entropy(logits, labels)
17: # ============================================================
18: # END EDITABLE
19: # ============================================================
cross_entropy baseline — editable region [READ-ONLY — reference implementation]
1: """Editable poison-robust loss for MLS-Bench."""
2:
3: import torch
4: import torch.nn.functional as F
5:
6: # ============================================================
7: # EDITABLE
8: class RobustLoss:
9: """Standard cross-entropy on poisoned labels."""
10:
11: def __init__(self):
12: Tool Input
1{
2 "end_line": 16,
3 "filename": "bench/poison/custom_robust_loss.py",
4 "start_line": 12,
5 "op": "replace",
6 "after_line": 1,
7 "content": " def __init__(self):\n self.k = 1.0 # Decay rate for high-loss samples\n\n def compute_loss(self, logits, labels, epoch):\n losses = F.cross_entropy(logits, labels, reduction='none')\n \n with torch.no_grad():\n mu = losses.mean()\n sigma = losses.std()\n \n # Z-score normalization of losses\n z = (losses - mu) / (sigma + 1e-8)\n \n # Suppress samples with higher than average loss\n w = torch.exp(-self.k * F.relu(z))\n \n # Normalize to preserve overall gradient magnitude\n w = w / (w.mean() + 1e-8)\n \n return (w * losses).mean()\n"
8}ERROR: Package 'bench' is not in allowed packages
Tool Input
1{
2 "after_line": 1,
3 "start_line": 12,
4 "op": "replace",
5 "content": " def __init__(self):\n self.k = 1.0 # Decay rate for high-loss samples\n\n def compute_loss(self, logits, labels, epoch):\n losses = F.cross_entropy(logits, labels, reduction='none')\n \n with torch.no_grad():\n mu = losses.mean()\n sigma = losses.std()\n \n # Z-score normalization of losses\n z = (losses - mu) / (sigma + 1e-8)\n \n # Suppress samples with higher than average loss\n w = torch.exp(-self.k * F.relu(z))\n \n # Normalize to preserve overall gradient magnitude\n w = w / (w.mean() + 1e-8)\n \n return (w * losses).mean()",
6 "filename": "pytorch-vision/bench/poison/custom_robust_loss.py",
7 "end_line": 16
8}OK: Replaced lines 12..16 with 20 line(s) in pytorch-vision/bench/poison/custom_robust_loss.py. Editable range: entire file.
[Current file: pytorch-vision/bench/poison/custom_robust_loss.py | editable: 1–34 | total: 34 lines]
1 """Editable poison-robust loss for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5
6 # ============================================================
7 # EDITABLE
8 # ============================================================
9 class RobustLoss:
10 """Default cross-entropy objective."""
11
12 def __init__(self):
13 self.k = 1.0 # Decay rate for high-loss samples
14
15 def compute_loss(self, logits, labels, epoch):
16 losses = F.cross_entropy(logits, labels, reduction='none')
17
18 with torch.no_grad():
19 mu = losses.mean()
20 sigma = losses.std()
21
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ## Seed 42 ### cifar10-labelflip (scripts/cifar10_labelflip.sh) TRAIN_METRICS epoch=1 loss=2.2368 TRAIN_METRICS epoch=2 loss=2.1888 TRAIN_METRICS epoch=3 loss=1.9441 TRAIN_METRICS epoch=4 loss=1.8644 TRAIN_METRICS epoch=5 loss=1.7702 cifar10-labelflip: test_acc=0.2907, poison_fit=0.1092, robust_score=0.5907 ### fmnist-labelflip (scripts/fmnist_labelflip.sh) TRAIN_METRICS epoch=1 loss=1.9800 TRAIN_METRICS epoch=2 loss=1.2586 TRAIN_METRICS epoch=3 loss=1.0580 TRAIN_METRICS epoch=4 loss=0.9498 TRAIN_METRICS epoch=5 loss=0.8899 fmnist-labelflip: test_acc=0.6880, poison_fit=0.0310, robust_score=0.8285 ### cifar100-labelflip (scripts/cifar100_labelflip.sh) TRAIN_METRICS epoch=1 loss=4.5010 TRAIN_METRICS epoch=2 loss=4.3066 TRAIN_METRICS epoch=3 loss=4.1742 TRAIN_METRICS epoch=4 loss=4.0694 TRAIN_METRICS epoch=5 loss=4.0010 cifar100-labelflip: test_acc=0.0440, poison_fit=0.0219, robust_score=0.5110 ## Seed 123 ### c
Tool Input
1{
2 "end_line": 31,
3 "filename": "pytorch-vision/bench/poison/custom_robust_loss.py",
4 "start_line": 12,
5 "op": "replace",
6 "content": " def __init__(self):\n self.alpha = 1.0\n self.beta = 1.0\n\n def compute_loss(self, logits, labels, epoch):\n probs = F.softmax(logits, dim=1).clamp_min(1e-8)\n log_probs = torch.log(probs)\n \n # NCE (Normalized Cross Entropy)\n ce = F.cross_entropy(logits, labels, reduction='none')\n denom = -log_probs.sum(dim=1)\n nce = (ce / (denom + 1e-8)).mean()\n \n # MAE / RCE (Mean Absolute Error)\n p_y = probs.gather(1, labels[:, None]).squeeze()\n mae = (1.0 - p_y).mean()\n \n return self.alpha * nce + self.beta * mae",
7 "after_line": 1
8}OK: Replaced lines 12..31 with 18 line(s) in pytorch-vision/bench/poison/custom_robust_loss.py. Editable range: entire file.
[Current file: pytorch-vision/bench/poison/custom_robust_loss.py | editable: 1–32 | total: 32 lines]
1 """Editable poison-robust loss for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5
6 # ============================================================
7 # EDITABLE
8 # ============================================================
9 class RobustLoss:
10 """Default cross-entropy objective."""
11
12 def __init__(self):
13 self.alpha = 1.0
14 self.beta = 1.0
15
16 def compute_loss(self, logits, labels, epoch):
17 probs = F.softmax(logits, dim=1).clamp_min(1e-8)
18 log_probs = torch.log(probs)
19
20 # NCE (Normalized Cross Entropy)
21 ce = F.cross_entropy(logits, labe
[Test #2] (1 test remaining; call submit(n=2) to submit this result)
### cifar10-labelflip (scripts/cifar10_labelflip.sh)
TRAIN_METRICS epoch=1 loss=0.9608
TRAIN_METRICS epoch=2 loss=0.9146
TRAIN_METRICS epoch=3 loss=0.8874
TRAIN_METRICS epoch=4 loss=0.8674
TRAIN_METRICS epoch=5 loss=0.8576
cifar10-labelflip: test_acc=0.2653, poison_fit=0.0942, robust_score=0.5856
### fmnist-labelflip (scripts/fmnist_labelflip.sh)
TRAIN_METRICS epoch=1 loss=0.8808
TRAIN_METRICS epoch=2 loss=0.6419
TRAIN_METRICS epoch=3 loss=0.5780
TRAIN_METRICS epoch=4 loss=0.4973
TRAIN_METRICS epoch=5 loss=0.4674
fmnist-labelflip: test_acc=0.7060, poison_fit=0.0110, robust_score=0.8475
### cifar100-labelflip (scripts/cifar100_labelflip.sh)
TRAIN_METRICS epoch=1 loss=1.0000
TRAIN_METRICS epoch=2 loss=0.9980
TRAIN_METRICS epoch=3 loss=0.9976
TRAIN_METRICS epoch=4 loss=0.9946
TRAIN_METRICS epoch=5 loss=0.9943
cifar100-labelflip: test_acc=0.0207, poison_fit=0.0105, robust_score=0.5051
[Leaderboard] Results saved: {'te
Tool Input
1{
2 "start_line": 10,
3 "end_line": 16,
4 "filename": "bench/poison/custom_robust_loss.py",
5 "op": "replace",
6 "content": " \"\"\"Generalized Jensen-Shannon Divergence Loss.\"\"\"\n\n def __init__(self):\n self.pi = 0.5\n\n def compute_loss(self, logits, labels, epoch):\n p = torch.softmax(logits, dim=1)\n y = F.one_hot(labels, num_classes=logits.shape[1]).float()\n \n m = self.pi * y + (1 - self.pi) * p\n m = m.clamp_min(1e-8)\n p_safe = p.clamp_min(1e-8)\n \n # KL(Y || M) = - \\sum Y \\log(M) + \\sum Y \\log Y = - \\log(M_y)\n kl_y_m = -torch.log(m.gather(1, labels[:, None])).squeeze(1)\n \n # KL(P || M) = \\sum P \\log(P / M)\n kl_p_m = (p * (torch.log(p_safe) - torch.log(m))).sum(dim=1)\n \n loss = self.pi * kl_y_m + (1.0 - self.pi) * kl_p_m\n return loss.mean()"
7}ERROR: Package 'bench' is not in allowed packages
Tool Input
1{
2 "filename": "pytorch-vision/bench/poison/custom_robust_loss.py",
3 "content": " \"\"\"Generalized Jensen-Shannon Divergence Loss.\"\"\"\n\n def __init__(self):\n self.pi = 0.5\n\n def compute_loss(self, logits, labels, epoch):\n p = torch.softmax(logits, dim=1)\n y = F.one_hot(labels, num_classes=logits.shape[1]).float()\n \n m = self.pi * y + (1.0 - self.pi) * p\n m = m.clamp_min(1e-8)\n p_safe = p.clamp_min(1e-8)\n \n # KL(Y || M) = - \\sum Y \\log(M)\n kl_y_m = -torch.log(m.gather(1, labels[:, None])).squeeze(1)\n \n # KL(P || M) = \\sum P \\log(P / M)\n kl_p_m = (p * (torch.log(p_safe) - torch.log(m))).sum(dim=1)\n \n loss = self.pi * kl_y_m + (1.0 - self.pi) * kl_p_m\n return loss.mean()",
4 "start_line": 10,
5 "end_line": 16,
6 "op": "replace"
7}OK: Replaced lines 10..16 with 21 line(s) in pytorch-vision/bench/poison/custom_robust_loss.py. Editable range: entire file.
[Current file: pytorch-vision/bench/poison/custom_robust_loss.py | editable: 1–33 | total: 33 lines]
1 """Editable poison-robust loss for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5
6 # ============================================================
7 # EDITABLE
8 # ============================================================
9 class RobustLoss:
10 """Generalized Jensen-Shannon Divergence Loss."""
11
12 def __init__(self):
13 self.pi = 0.5
14
15 def compute_loss(self, logits, labels, epoch):
16 p = torch.softmax(logits, dim=1)
17 y = F.one_hot(labels, num_classes=logits.shape[1]).float()
18
19 m = self.pi * y + (1.0 - self.pi) * p
20 m = m.clamp_min(1e-8)
21
Tool Input
1{
2 "end_line": 18,
3 "op": "replace",
4 "start_line": 9,
5 "content": "class RobustLoss:\n \"\"\"Dynamic Bootstrapped Generalized Cross-Entropy (DBGCE).\n \n Combines the robust gradient-bounding of Generalized Cross-Entropy (GCE)\n with dynamic self-distillation (soft bootstrapping) to prevent memorization\n of noisy labels and correct targets dynamically over the training schedule.\n \"\"\"\n\n def __init__(self):\n self.q = 0.7\n\n def compute_loss(self, logits, labels, epoch):\n probs = torch.softmax(logits, dim=1)\n \n # Hard targeted labels\n y_hard = F.one_hot(labels, num_classes=logits.shape[1]).float()\n # Soft targeted pseudo-labels from the model's current predictions\n y_soft = probs.detach()\n \n # Dynamic mixing coefficient (beta):\n # - Epochs 0-15 (Warmup): beta=1.0. The model learns clean, easy samples \n # using GCE to bound the influence of noisy labels.\n # - Epochs 15-100: beta decays linearly from 1.0 to 0.4. The model increasingly \n # relies on its own clean representations (self-distillation) to correct \n # the poisoned targets.\n warmup = 15\n if epoch < warmup:\n beta = 1.0\n else:\n beta = max(0.4, 1.0 - 0.6 * (epoch - warmup) / (100.0 - warmup))\n \n target = beta * y_hard + (1.0 - beta) * y_soft\n \n # Element-wise Generalized Cross Entropy for soft targets\n p_q = probs.clamp(min=1e-7).pow(self.q)\n loss = target * (1.0 - p_q) / self.q\n \n return loss.sum(dim=1).mean()",
6 "filename": "bench/poison/custom_robust_loss.py"
7}ERROR: Package 'bench' is not in allowed packages
Tool Input
1{
2 "start_line": 9,
3 "op": "replace",
4 "content": "class RobustLoss:\n \"\"\"Dynamic Bootstrapped Generalized Cross-Entropy (DBGCE).\n \n Combines the robust gradient-bounding of Generalized Cross-Entropy (GCE)\n with dynamic self-distillation (soft bootstrapping) to prevent memorization\n of noisy labels and correct targets dynamically over the training schedule.\n \"\"\"\n\n def __init__(self):\n self.q = 0.7\n\n def compute_loss(self, logits, labels, epoch):\n probs = torch.softmax(logits, dim=1)\n \n # Hard targeted labels\n y_hard = F.one_hot(labels, num_classes=logits.shape[1]).float()\n # Soft targeted pseudo-labels from the model's current predictions\n y_soft = probs.detach()\n \n # Dynamic mixing coefficient (beta):\n # - Epochs 0-15 (Warmup): beta=1.0. The model learns clean, easy samples \n # using GCE to bound the influence of noisy labels.\n # - Epochs 15-100: beta decays linearly from 1.0 to 0.4. The model increasingly \n # relies on its own clean representations (self-distillation) to correct \n # the poisoned targets.\n warmup = 15\n if epoch < warmup:\n beta = 1.0\n else:\n beta = max(0.4, 1.0 - 0.6 * (epoch - warmup) / (100.0 - warmup))\n \n target = beta * y_hard + (1.0 - beta) * y_soft\n \n # Element-wise Generalized Cross Entropy for soft targets\n p_q = probs.clamp(min=1e-7).pow(self.q)\n loss = target * (1.0 - p_q) / self.q\n \n return loss.sum(dim=1).mean()",
5 "filename": "pytorch-vision/bench/poison/custom_robust_loss.py",
6 "end_line": 18
7}OK: Replaced lines 9..18 with 38 line(s) in pytorch-vision/bench/poison/custom_robust_loss.py. Editable range: entire file.
[Current file: pytorch-vision/bench/poison/custom_robust_loss.py | editable: 1–47 | total: 47 lines]
1 """Editable poison-robust loss for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5
6 # ============================================================
7 # EDITABLE
8 # ============================================================
9 class RobustLoss:
10 """Dynamic Bootstrapped Generalized Cross-Entropy (DBGCE).
11
12 Combines the robust gradient-bounding of Generalized Cross-Entropy (GCE)
13 with dynamic self-distillation (soft bootstrapping) to prevent memorization
14 of noisy labels and correct targets dynamically over the training schedule.
15 """
16
17 def __init__(self):
18 self.q = 0.7
19
20