Agent Conversation: gpt-5.4-pro
Task: security-adversarial-training
Adversarial Training for Model Robustness
Research Question
How to design better adversarial training methods to enhance model robustness against L_inf adversarial attacks?
Background
Adversarial training is the most effective approach for improving neural network robustness against adversarial examples. The standard method (Madry et al., 2018) trains on PGD-generated adversarial examples using cross-entropy loss, but suffers from a trade-off between clean accuracy and robust accuracy. Advanced methods like TRADES and MART address this through different loss formulations that decouple the robustness objective from clean classification.
Task
Implement a novel adversarial training method in bench/custom_adv_train.py by modifying the AdversarialTrainer class. Your method should improve robust accuracy against white-box L_inf attacks while maintaining reasonable clean accuracy.
Interface
You must implement the AdversarialTrainer class with two methods:
-
__init__(self, model, eps, alpha, attack_steps, num_classes, **kwargs): Initialize your trainer.model: The neural network to train (nn.Module).eps: L_inf perturbation budget (0.3 for MNIST, 8/255 for CIFAR).alpha: Step size for inner PGD attack.attack_steps: Number of PGD steps for adversarial example generation.num_classes: Number of output classes (10 or 100).
-
train_step(self, images, labels, optimizer) -> dict: Perform one training step.images: Clean images, shape(N, C, H, W), values in[0, 1].labels: Ground truth labels, shape(N,).optimizer: SGD optimizer (lr, momentum, weight_decay already configured).- Returns: dict with at least
'loss'key (float).
The training loop, learning rate schedule (cosine annealing), model architecture, and data loading are handled externally. You only control the adversarial training procedure within each step.
Evaluation
After training, models are evaluated on:
- Clean accuracy: Accuracy on unperturbed test images.
- Robust accuracy (FGSM): Accuracy under 1-step FGSM attack.
- Robust accuracy (PGD-50): Accuracy under 50-step PGD attack (primary metric).
Four scenarios (model + dataset):
- SmallCNN on MNIST (eps = 0.3)
- PreActResNet-18 on CIFAR-10 (eps = 8/255)
- VGG-11-BN on CIFAR-10 (eps = 8/255)
- PreActResNet-18 on CIFAR-100 (eps = 8/255)
Higher robust accuracy (PGD-50) across all scenarios is better.
Baselines
standard: Vanilla training (no adversarial examples). High clean accuracy, ~0% robust accuracy.pgdat: PGD Adversarial Training (Madry et al., 2018). Trains on PGD adversarial examples with CE loss.trades: TRADES (Zhang et al., 2019). Balances clean and robust accuracy via KL divergence regularization.mart: MART (Wang et al., 2020). Misclassification-aware regularization that focuses on hard examples.awp: AWP + TRADES (Wu et al., 2020). Adversarial weight perturbation
Tool Input
1{
2 "op": "replace",
3 "filename": "torchattacks/bench/custom_adv_train.py",
4 "content": "class AdversarialTrainer:\n \"\"\"Confidence-adaptive rival-aware adversarial training with AWP.\"\"\"\n\n def __init__(self, model, eps, alpha, attack_steps, num_classes, **kwargs):\n import copy\n self.model = model\n self.eps = eps\n self.alpha = alpha\n self.attack_steps = attack_steps\n self.num_classes = num_classes\n self.beta = 6.0\n self.temp = 2.0\n self.gamma = 0.5 if num_classes <= 10 else 0.75\n self.attack_margin = 0.2\n self.momentum = 0.75\n self.awp_gamma = 0.005\n self.step = 0\n self.warmup = 200 if eps > 0.1 else 1000\n self.proxy = copy.deepcopy(model)\n self.proxy_optim = torch.optim.SGD(self.proxy.parameters(), lr=0.01)\n\n def _mix_rival(self, logits, labels):\n probs = F.softmax(logits.detach(), dim=1)\n conf = probs.gather(1, labels[:, None]).squeeze(1)\n mix = conf.clamp(0.2, 0.8)\n masked = logits.detach().clone()\n masked[torch.arange(labels.size(0), device=labels.device), labels] = -1e9\n rival = masked.argmax(dim=1)\n return mix, conf, rival\n\n def _terms(self, logits, ref_logits, labels, rival):\n t = self.temp\n ce = F.cross_entropy(logits, labels, reduction='none')\n kl = F.kl_div(\n F.log_softmax(logits / t, dim=1),\n F.softmax(ref_logits / t, dim=1),\n reduction='none',\n ).sum(dim=1) * (t * t)\n true_logit = logits.gather(1, labels[:, None]).squeeze(1)\n rival_logit = logits.gather(1, rival[:, None]).squeeze(1)\n margin = F.softplus(rival_logit - true_logit)\n return ce, kl, margin\n\n def _robust_loss(self, logits_clean, logits_adv, labels, ramp):\n mix, conf, rival = self._mix_rival(logits_clean, labels)\n ce_adv, kl_adv, margin = self._terms(logits_adv, logits_clean.detach(), labels, rival)\n robust = ((1.0 - mix) * ce_adv + self.beta * mix * kl_adv + self.gamma * (1.5 - conf) * margin).mean()\n return F.cross_entropy(logits_clean, labels) + ramp * robust\n\n def _make_adv(self, images, labels, logits_clean, ramp):\n mix, _, rival = self._mix_rival(logits_clean, labels)\n eps = self.eps * (0.1 + 0.9 * ramp)\n alpha = self.alpha * (0.1 + 0.9 * ramp)\n adv = images.detach() + torch.empty_like(images).uniform_(-eps, eps)\n adv = adv.clamp(0.0, 1.0)\n mom = torch.zeros_like(adv)\n self.model.eval()\n for _ in range(self.attack_steps):\n adv.requires_grad_(True)\n logits_adv = self.model(adv)\n ce, kl, margin = self._terms(logits_adv, logits_clean.detach(), labels, rival)\n loss = ((1.0 - mix) * ce + mix * kl + self.attack_margin * margin).mean()\n grad = torch.autograd.grad(loss, adv)[0]\n grad = grad / (grad.abs().mean(dim=(1, 2, 3), keepdim=True) + 1e-12)\n mom = self.momentum * mom + grad\n adv = adv.detach() + alpha * mom.sign()\n delta = torch.clamp(adv - images, min=-eps, max=eps)\n adv = (images + delta).clamp(0.0, 1.0).detach()\n return adv\n\n def _calc_awp(self, images, adv_images, labels, ramp):\n self.proxy.load_state_dict(self.model.state_dict())\n self.proxy.train()\n loss = -self._robust_loss(self.proxy(images), self.proxy(adv_images), labels, ramp)\n self.proxy_optim.zero_grad()\n loss.backward()\n self.proxy_optim.step()\n diff = {}\n for (name, p), (_, q) in zip(self.model.named_parameters(), self.proxy.named_parameters()):\n if p.requires_grad and p.dim() > 1:\n d = q.data - p.data\n diff[name] = d * (self.awp_gamma * p.data.norm() / (d.norm() + 1e-12))\n return diff\n\n def _perturb(self, diff):\n for name, p in self.model.named_parameters():\n if name in diff:\n p.data.add_(diff[name])\n\n def _restore(self, diff):\n for name, p in self.model.named_parameters():\n if name in diff:\n p.data.sub_(diff[name])\n\n def train_step(self, images, labels, optimizer):\n self.step += 1\n ramp = min(1.0, self.step / float(self.warmup))\n self.model.train()\n logits_clean = self.model(images)\n adv_images = self._make_adv(images, labels, logits_clean, ramp)\n self.model.train()\n diff = None\n if ramp > 0.2:\n diff = self._calc_awp(images, adv_images, labels, ramp)\n self._perturb(diff)\n loss = self._robust_loss(self.model(images), self.model(adv_images), labels, ramp)\n optimizer.zero_grad()\n loss.backward()\n if diff is not None:\n self._restore(diff)\n optimizer.step()\n return {'loss': loss.item()}\n",
5 "after_line": 54,
6 "start_line": 10,
7 "end_line": 54
8}OK: Replaced lines 10..54 with 112 line(s) in torchattacks/bench/custom_adv_train.py. Editable range: 10–121.
[Current file: torchattacks/bench/custom_adv_train.py | editable: 10–121 | total: 124 lines]
7| # ═══════════════════════════════════════════════════════════════════
8| # EDITABLE — implement AdversarialTrainer below
9| # ═══════════════════════════════════════════════════════════════════
10 class AdversarialTrainer:
11 """Confidence-adaptive rival-aware adversarial training with AWP."""
12
13 def __init__(self, model, eps, alpha, attack_steps, num_classes, **kwargs):
14 import copy
15 self.model = model
16 self.eps = eps
17 self.alpha = alpha
18 self.attack_steps = attack_steps
19 self.num_classes = num_classes
20 self.beta = 6.0
21 self.temp = 2.0
22 self.gamma = 0.5 if num_classes <= 10 else 0.75
23 sel
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ## Seed 42 ### SmallCNN-MNIST (scripts/SmallCNN_MNIST.sh) Training progress (SmallCNN-MNIST): TRAIN_METRICS epoch=1 loss=3.9368 lr=0.009961 TRAIN_METRICS epoch=10 loss=1.3749 lr=0.006545 TRAIN_METRICS epoch=20 loss=0.9697 lr=0.000955 Results (SmallCNN-MNIST): clean_acc=0.9888, robust_acc_fgsm=0.9678, robust_acc_pgd=0.9426 ### PreActResNet18-C10 (scripts/PreActResNet18_C10.sh) Training progress (PreActResNet18-C10): TRAIN_METRICS epoch=40 loss=2.7579 lr=0.050000 TRAIN_METRICS epoch=50 loss=2.6728 lr=0.030866 TRAIN_METRICS epoch=60 loss=2.5496 lr=0.014645 TRAIN_METRICS epoch=70 loss=2.4414 lr=0.003806 TRAIN_METRICS epoch=80 loss=2.3670 lr=0.000000 Results (PreActResNet18-C10): clean_acc=0.8649, robust_acc_fgsm=0.5826, robust_acc_pgd=0.4957 ## Seed 123 ### SmallCNN-MNIST (scripts/SmallCNN_MNIST.sh) Training progress (SmallCNN-MNIST): TRAIN_METRICS epoch=1 loss=3.7285 lr=0.009961 TRAIN_METRICS epoch=10 loss=1.3401
Tool Input
1{
2 "op": "replace",
3 "filename": "torchattacks/bench/custom_adv_train.py",
4 "content": "class AdversarialTrainer:\n \"\"\"Confidence-adaptive boundary TRADES with AWP.\"\"\"\n\n def __init__(self, model, eps, alpha, attack_steps, num_classes, **kwargs):\n import copy\n self.model = model\n self.eps = eps\n self.alpha = alpha\n self.attack_steps = attack_steps\n self.num_classes = num_classes\n self.beta = 6.0\n self.outer_ce = 0.5\n self.attack_ce = 1.0\n self.margin_weight = 0.25 if num_classes <= 10 else 0.35\n self.attack_margin = 0.3\n self.momentum = 0.75\n self.awp_gamma = 0.005\n self.step = 0\n self.awp_start = 500 if eps > 0.1 else 4000\n self.proxy = copy.deepcopy(model)\n self.proxy_optim = torch.optim.SGD(self.proxy.parameters(), lr=0.01)\n\n def _nat_stats(self, logits, labels):\n probs = F.softmax(logits.detach(), dim=1)\n conf = probs.gather(1, labels[:, None]).squeeze(1)\n masked = logits.detach().clone()\n masked[torch.arange(labels.size(0), device=labels.device), labels] = -1e9\n rival = masked.argmax(dim=1)\n return probs, conf, rival\n\n def _kl_per(self, logits_adv, nat_probs):\n return F.kl_div(\n F.log_softmax(logits_adv, dim=1),\n nat_probs,\n reduction='none',\n ).sum(dim=1)\n\n def _margin_per(self, logits, labels, rival):\n true_logit = logits.gather(1, labels[:, None]).squeeze(1)\n rival_logit = logits.gather(1, rival[:, None]).squeeze(1)\n return F.softplus(rival_logit - true_logit)\n\n def _robust_loss(self, logits_clean, logits_adv, labels):\n nat_probs, conf, rival = self._nat_stats(logits_clean, labels)\n hardness = 1.0 - conf\n kl = self._kl_per(logits_adv, nat_probs)\n margin = self._margin_per(logits_adv, labels, rival)\n ce_adv = F.cross_entropy(logits_adv, labels, reduction='none')\n loss_clean = F.cross_entropy(logits_clean, labels)\n loss_robust = self.beta * (kl + self.margin_weight * hardness * margin).mean()\n loss_hard = self.outer_ce * (hardness * ce_adv).mean()\n return loss_clean + loss_robust + loss_hard\n\n def _make_adv(self, images, labels, logits_clean):\n nat_probs, conf, rival = self._nat_stats(logits_clean, labels)\n hardness = 1.0 - conf\n adv = images.detach() + torch.empty_like(images).uniform_(-self.eps, self.eps)\n adv = adv.clamp(0.0, 1.0)\n mom = torch.zeros_like(adv)\n self.model.eval()\n for _ in range(self.attack_steps):\n adv.requires_grad_(True)\n logits_adv = self.model(adv)\n kl = self._kl_per(logits_adv, nat_probs)\n ce = F.cross_entropy(logits_adv, labels, reduction='none')\n margin = self._margin_per(logits_adv, labels, rival)\n loss = (kl + self.attack_ce * hardness * ce + self.attack_margin * hardness * margin).mean()\n grad = torch.autograd.grad(loss, adv)[0]\n grad = grad / (grad.abs().mean(dim=(1, 2, 3), keepdim=True) + 1e-12)\n mom = self.momentum * mom + grad\n adv = adv.detach() + self.alpha * mom.sign()\n delta = torch.clamp(adv - images, min=-self.eps, max=self.eps)\n adv = torch.clamp(images + delta, 0.0, 1.0).detach()\n return adv\n\n def _calc_awp(self, images, adv_images, labels):\n self.proxy.load_state_dict(self.model.state_dict())\n self.proxy.train()\n loss = -self._robust_loss(self.proxy(images), self.proxy(adv_images), labels)\n self.proxy_optim.zero_grad()\n loss.backward()\n self.proxy_optim.step()\n diff = {}\n for (name, p), (_, q) in zip(self.model.named_parameters(), self.proxy.named_parameters()):\n if p.requires_grad and p.dim() > 1:\n d = q.data - p.data\n diff[name] = d * (self.awp_gamma * p.data.norm() / (d.norm() + 1e-12))\n return diff\n\n def _perturb(self, diff):\n for name, p in self.model.named_parameters():\n if name in diff:\n p.data.add_(diff[name])\n\n def _restore(self, diff):\n for name, p in self.model.named_parameters():\n if name in diff:\n p.data.sub_(diff[name])\n\n def train_step(self, images, labels, optimizer):\n self.step += 1\n self.model.train()\n logits_clean = self.model(images)\n adv_images = self._make_adv(images, labels, logits_clean)\n self.model.train()\n diff = None\n if self.step >= self.awp_start:\n diff = self._calc_awp(images, adv_images, labels)\n self._perturb(diff)\n loss = self._robust_loss(self.model(images), self.model(adv_images), labels)\n optimizer.zero_grad()\n loss.backward()\n if diff is not None:\n self._restore(diff)\n optimizer.step()\n return {'loss': loss.item()}\n",
5 "after_line": 121,
6 "start_line": 10,
7 "end_line": 121
8}OK: Replaced lines 10..121 with 116 line(s) in torchattacks/bench/custom_adv_train.py. Editable range: 10–125.
[Current file: torchattacks/bench/custom_adv_train.py | editable: 10–125 | total: 128 lines]
7| # ═══════════════════════════════════════════════════════════════════
8| # EDITABLE — implement AdversarialTrainer below
9| # ═══════════════════════════════════════════════════════════════════
10 class AdversarialTrainer:
11 """Confidence-adaptive boundary TRADES with AWP."""
12
13 def __init__(self, model, eps, alpha, attack_steps, num_classes, **kwargs):
14 import copy
15 self.model = model
16 self.eps = eps
17 self.alpha = alpha
18 self.attack_steps = attack_steps
19 self.num_classes = num_classes
20 self.beta = 6.0
21 self.outer_ce = 0.5
22 self.attack_ce = 1.0
23 self.margin_weight = 0.25 if num_classes <
[Test #2] (1 test remaining; call submit(n=2) to submit this result)
### SmallCNN-MNIST (scripts/SmallCNN_MNIST.sh)
Training progress (SmallCNN-MNIST):
TRAIN_METRICS epoch=1 loss=3.2572 lr=0.009961
TRAIN_METRICS epoch=10 loss=0.6001 lr=0.006545
TRAIN_METRICS epoch=20 loss=0.4649 lr=0.000955
Results (SmallCNN-MNIST): clean_acc=0.9879, robust_acc_fgsm=0.9679, robust_acc_pgd=0.9424
### PreActResNet18-C10 (scripts/PreActResNet18_C10.sh)
Training progress (PreActResNet18-C10):
TRAIN_METRICS epoch=40 loss=2.9358 lr=0.050000
TRAIN_METRICS epoch=50 loss=2.8351 lr=0.030866
TRAIN_METRICS epoch=60 loss=2.7198 lr=0.014645
TRAIN_METRICS epoch=70 loss=2.5841 lr=0.003806
TRAIN_METRICS epoch=80 loss=2.4958 lr=0.000000
Results (PreActResNet18-C10): clean_acc=0.8594, robust_acc_fgsm=0.5884, robust_acc_pgd=0.5075
[Leaderboard] Results saved: {'clean_acc_SmallCNN_MNIST': 0.9879, 'robust_acc_fgsm_SmallCNN_MNIST': 0.9679, 'robust_acc_pgd_SmallCNN_MNIST': 0.9424, 'elapsed_SmallCNN-MNIST': 676, 'clean_acc_
Tool Input
1{
2 "n": 2
3}[submit] Submitting result from test #2 as final. ### SmallCNN-MNIST (scripts/SmallCNN_MNIST.sh) Training progress (SmallCNN-MNIST): TRAIN_METRICS epoch=1 loss=3.2572 lr=0.009961 TRAIN_METRICS epoch=10 loss=0.6001 lr=0.006545 TRAIN_METRICS epoch=20 loss=0.4649 lr=0.000955 Results (SmallCNN-MNIST): clean_acc=0.9879, robust_acc_fgsm=0.9679, robust_acc_pgd=0.9424 ### PreActResNet18-C10 (scripts/PreActResNet18_C10.sh) Training progress (PreActResNet18-C10): TRAIN_METRICS epoch=40 loss=2.9358 lr=0.050000 TRAIN_METRICS epoch=50 loss=2.8351 lr=0.030866 TRAIN_METRICS epoch=60 loss=2.7198 lr=0.014645 TRAIN_METRICS epoch=70 loss=2.5841 lr=0.003806 TRAIN_METRICS epoch=80 loss=2.4958 lr=0.000000 Results (PreActResNet18-C10): clean_acc=0.8594, robust_acc_fgsm=0.5884, robust_acc_pgd=0.5075 ### PreActResNet18-C100 (scripts/PreActResNet18_C100.sh) Training progress (PreActResNet18-C100): TRAIN_METRICS epoch=40 loss=7.0003 lr=0.050000 TRAIN_METRICS epoch=50 loss=6.6937 lr=0.030866 TRAIN_METRICS epoc
Tool Input
1{
2 "op": "replace",
3 "filename": "torchattacks/bench/custom_adv_train.py",
4 "content": "class AdversarialTrainer:\n \"\"\"Confidence-interpolated adversarial training with margin-aware PGD.\"\"\"\n\n def __init__(self, model, eps, alpha, attack_steps, num_classes, **kwargs):\n self.model = model\n self.eps = eps\n self.alpha = alpha\n self.attack_steps = attack_steps\n self.num_classes = num_classes\n self.beta = 6.0 if num_classes <= 10 else 5.0\n self.kl_w, self.margin_w, self.step = 0.3, 0.2, 0\n\n def _soft_ce(self, logits, target):\n return -(target * F.log_softmax(logits, dim=1)).sum(dim=1)\n\n def _attack(self, images, labels, target, clean_probs, eps, alpha):\n self.model.eval()\n adv = (images.detach() + 0.001 * torch.randn_like(images)).clamp(0.0, 1.0)\n mask = F.one_hot(labels, self.num_classes).bool()\n for _ in range(self.attack_steps):\n adv.requires_grad_(True)\n logits = self.model(adv)\n true_logit = logits.gather(1, labels[:, None]).squeeze(1)\n other_logit = logits.masked_fill(mask, -1e9).logsumexp(1)\n loss = self._soft_ce(logits, target).mean()\n loss = loss + self.kl_w * F.kl_div(F.log_softmax(logits, 1), clean_probs, reduction='batchmean')\n loss = loss + self.margin_w * (other_logit - true_logit).mean()\n grad = torch.autograd.grad(loss, adv)[0]\n adv = adv.detach() + alpha * grad.sign()\n adv = torch.max(torch.min(adv, images + eps), images - eps).clamp(0.0, 1.0)\n return adv.detach()\n\n def train_step(self, images, labels, optimizer):\n self.step += 1\n warm = min(1.0, self.step / 1500.0)\n eps = self.eps * (0.2 + 0.8 * warm)\n alpha = self.alpha * (0.2 + 0.8 * warm)\n beta = self.beta * (0.5 + 0.5 * warm)\n self.model.train()\n logits_clean = self.model(images)\n clean_probs = F.softmax(logits_clean, dim=1)\n p_true = clean_probs.gather(1, labels[:, None]).squeeze(1)\n mix = (1.0 - p_true).detach().unsqueeze(1)\n target = (1.0 - mix) * clean_probs.detach() + mix * F.one_hot(labels, self.num_classes).float()\n adv_images = self._attack(images, labels, target, clean_probs.detach(), eps, alpha)\n self.model.train()\n logits_adv = self.model(adv_images)\n kl = F.kl_div(F.log_softmax(logits_adv, 1), clean_probs.detach(), reduction='none').sum(1)\n weight = 1.0 + 0.5 * mix.squeeze(1)\n robust = (weight * (self._soft_ce(logits_adv, target) + self.kl_w * kl)).mean()\n loss = F.cross_entropy(logits_clean, labels) + beta * robust\n optimizer.zero_grad()\n loss.backward()\n optimizer.step()\n return {'loss': loss.item()}",
5 "after_line": 9,
6 "start_line": 10,
7 "end_line": 54
8}OK: Replaced lines 10..54 with 55 line(s) in torchattacks/bench/custom_adv_train.py. Editable range: 10–64.
[Current file: torchattacks/bench/custom_adv_train.py | editable: 10–64 | total: 67 lines]
7| # ═══════════════════════════════════════════════════════════════════
8| # EDITABLE — implement AdversarialTrainer below
9| # ═══════════════════════════════════════════════════════════════════
10 class AdversarialTrainer:
11 """Confidence-interpolated adversarial training with margin-aware PGD."""
12
13 def __init__(self, model, eps, alpha, attack_steps, num_classes, **kwargs):
14 self.model = model
15 self.eps = eps
16 self.alpha = alpha
17 self.attack_steps = attack_steps
18 self.num_classes = num_classes
19 self.beta = 6.0 if num_classes <= 10 else 5.0
20 self.kl_w, self.margin_w, self.step = 0.3, 0.2, 0
21
22 def _soft_ce(self, logits
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ## Seed 42 ### SmallCNN-MNIST (scripts/SmallCNN_MNIST.sh) Training progress (SmallCNN-MNIST): TRAIN_METRICS epoch=1 loss=2.5046 lr=0.009961 TRAIN_METRICS epoch=10 loss=1.0427 lr=0.006545 TRAIN_METRICS epoch=20 loss=0.3657 lr=0.000955 Results (SmallCNN-MNIST): clean_acc=0.9920, robust_acc_fgsm=0.9640, robust_acc_pgd=0.9348 ### PreActResNet18-C10 (scripts/PreActResNet18_C10.sh) Training progress (PreActResNet18-C10): TRAIN_METRICS epoch=40 loss=12.3032 lr=0.050000 TRAIN_METRICS epoch=50 loss=11.0394 lr=0.030866 TRAIN_METRICS epoch=60 loss=9.1577 lr=0.014645 TRAIN_METRICS epoch=70 loss=6.4921 lr=0.003806 TRAIN_METRICS epoch=80 loss=4.5936 lr=0.000000 Results (PreActResNet18-C10): clean_acc=0.8507, robust_acc_fgsm=0.5529, robust_acc_pgd=0.4516 ## Seed 123 ### SmallCNN-MNIST (scripts/SmallCNN_MNIST.sh) Training progress (SmallCNN-MNIST): TRAIN_METRICS epoch=1 loss=2.2061 lr=0.009961 TRAIN_METRICS epoch=10 loss=0.96
Tool Input
1{
2 "op": "replace",
3 "filename": "torchattacks/bench/custom_adv_train.py",
4 "content": "class AdversarialTrainer:\n \"\"\"Hardness-adaptive AWP-TRADES with margin-aware inner maximization.\"\"\"\n\n def __init__(self, model, eps, alpha, attack_steps, num_classes, **kwargs):\n import copy\n self.model, self.eps, self.alpha = model, eps, alpha\n self.attack_steps, self.num_classes = attack_steps, num_classes\n self.beta = 6.0 if num_classes <= 10 else 5.0\n self.hard_w, self.margin_w, self.awp_gamma, self.step = 1.0, 0.2, 0.005, 0\n self.proxy = copy.deepcopy(model)\n self.proxy_optim = torch.optim.SGD(self.proxy.parameters(), lr=0.01)\n\n def _loss_terms(self, logits_clean, logits_adv, labels):\n probs = F.softmax(logits_clean, dim=1)\n p_true = probs.gather(1, labels[:, None]).squeeze(1).detach()\n weight = 1.0 + self.hard_w * (1.0 - p_true)\n loss_nat = F.cross_entropy(logits_clean, labels)\n loss_rob = F.kl_div(F.log_softmax(logits_adv, dim=1), probs.detach(), reduction='none').sum(1)\n return loss_nat, (weight * loss_rob).mean()\n\n def _attack(self, images, labels):\n self.model.eval()\n with torch.no_grad():\n clean_probs = F.softmax(self.model(images), dim=1)\n adv = (images.detach() + torch.empty_like(images).uniform_(-self.eps, self.eps)).clamp(0.0, 1.0)\n mask = F.one_hot(labels, self.num_classes).bool()\n for _ in range(self.attack_steps):\n adv.requires_grad_(True)\n logits_adv = self.model(adv)\n true_logit = logits_adv.gather(1, labels[:, None]).squeeze(1)\n other_logit = logits_adv.masked_fill(mask, -1e9).max(1)[0]\n loss = F.kl_div(F.log_softmax(logits_adv, dim=1), clean_probs, reduction='batchmean')\n loss = loss + self.margin_w * (other_logit - true_logit).mean()\n grad = torch.autograd.grad(loss, adv)[0]\n adv = adv.detach() + self.alpha * grad.sign()\n adv = (images + (adv - images).clamp(-self.eps, self.eps)).clamp(0.0, 1.0).detach()\n return adv\n\n def _calc_awp(self, images, adv_images, labels, scale):\n if scale <= 0.0:\n return {}\n self.proxy.load_state_dict(self.model.state_dict())\n self.proxy.train()\n loss_nat, loss_rob = self._loss_terms(self.proxy(images), self.proxy(adv_images), labels)\n loss = -(loss_nat + self.beta * loss_rob)\n self.proxy_optim.zero_grad(); loss.backward(); self.proxy_optim.step()\n diff = {}\n for (name, param), (_, proxy_param) in zip(self.model.named_parameters(), self.proxy.named_parameters()):\n if param.ndim > 1:\n delta = proxy_param.data - param.data\n diff[name] = scale * param.data.norm() / (delta.norm() + 1e-12) * delta\n return diff\n\n def _perturb(self, diff):\n for name, param in self.model.named_parameters():\n if name in diff: param.data.add_(diff[name])\n\n def _restore(self, diff):\n for name, param in self.model.named_parameters():\n if name in diff: param.data.sub_(diff[name])\n\n def train_step(self, images, labels, optimizer):\n self.step += 1\n adv_images = self._attack(images, labels)\n diff = self._calc_awp(images, adv_images, labels, self.awp_gamma * min(1.0, self.step / 1000.0))\n self._perturb(diff)\n self.model.train()\n loss_nat, loss_rob = self._loss_terms(self.model(images), self.model(adv_images), labels)\n loss = loss_nat + self.beta * loss_rob\n optimizer.zero_grad(); loss.backward(); self._restore(diff); optimizer.step()\n return {'loss': loss.item()}",
5 "after_line": 9,
6 "start_line": 10,
7 "end_line": 64
8}OK: Replaced lines 10..64 with 71 line(s) in torchattacks/bench/custom_adv_train.py. Editable range: 10–80.
[Current file: torchattacks/bench/custom_adv_train.py | editable: 10–80 | total: 83 lines]
7| # ═══════════════════════════════════════════════════════════════════
8| # EDITABLE — implement AdversarialTrainer below
9| # ═══════════════════════════════════════════════════════════════════
10 class AdversarialTrainer:
11 """Hardness-adaptive AWP-TRADES with margin-aware inner maximization."""
12
13 def __init__(self, model, eps, alpha, attack_steps, num_classes, **kwargs):
14 import copy
15 self.model, self.eps, self.alpha = model, eps, alpha
16 self.attack_steps, self.num_classes = attack_steps, num_classes
17 self.beta = 6.0 if num_classes <= 10 else 5.0
18 self.hard_w, self.margin_w, self.awp_gamma, self.step = 1.0, 0.2, 0.005, 0
19 self.proxy = copy.deepcopy
[Test #2] (1 test remaining; call submit(n=2) to submit this result)
### SmallCNN-MNIST (scripts/SmallCNN_MNIST.sh)
Training progress (SmallCNN-MNIST):
TRAIN_METRICS epoch=1 loss=2.5601 lr=0.009961
TRAIN_METRICS epoch=10 loss=0.6109 lr=0.006545
TRAIN_METRICS epoch=20 loss=0.4418 lr=0.000955
Results (SmallCNN-MNIST): clean_acc=0.9898, robust_acc_fgsm=0.9688, robust_acc_pgd=0.9434
### PreActResNet18-C10 (scripts/PreActResNet18_C10.sh)
Training progress (PreActResNet18-C10):
TRAIN_METRICS epoch=40 loss=2.1916 lr=0.050000
TRAIN_METRICS epoch=50 loss=2.3157 lr=0.030866
TRAIN_METRICS epoch=60 loss=2.4066 lr=0.014645
TRAIN_METRICS epoch=70 loss=2.4992 lr=0.003806
TRAIN_METRICS epoch=80 loss=2.4786 lr=0.000000
Results (PreActResNet18-C10): clean_acc=0.8615, robust_acc_fgsm=0.5825, robust_acc_pgd=0.5125
[Leaderboard] Results saved: {'clean_acc_SmallCNN_MNIST': 0.9898, 'robust_acc_fgsm_SmallCNN_MNIST': 0.9688, 'robust_acc_pgd_SmallCNN_MNIST': 0.9434, 'elapsed_SmallCNN-MNIST': 632, 'clean_acc_
Tool Input
1{
2 "n": 2
3}