Agent Conversation: gpt-5.4-pro
Model: gpt-5.4-pro
Total steps: 5
Messages: 11
UserStep 0
Task: security-adversarial-attack-sparse-l0
Sparse Adversarial Attack (L0 Constraint)
Objective
Implement a stronger sparse attack in bench/custom_attack.py.
Your method should maximize attack success rate (ASR) under a strict L0 perturbation budget:
- Threat model: full model access for custom attack implementation.
- Norm constraint: number of modified spatial pixels is bounded.
- Budget:
L0(x_adv, x) <= pixels, wherepixels = 10.
Editable Interface
You must implement:
run_attack(model, images, labels, pixels, device, n_classes) -> adv_images
Inputs:
images: tensor of shape(N, C, H, W), values in[0, 1].labels: tensor of shape(N,).pixels: maximum number of modified spatial pixels per sample.n_classes: 10 for CIFAR-10, 100 for CIFAR-100.
Output:
adv_images: same shape asimages, also in[0, 1].
Evaluation Protocol
Each evaluation script:
- Loads one pretrained model.
- Collects up to 1000 samples that are initially classified correctly.
- Runs your
run_attack. - Checks
L0validity (<= pixelsmodified spatial pixels). - Reports:
clean_accrobust_accasr = 1 - robust_acc
Important:
- ASR denominator is the number of initially correct samples.
- Invalid adversarial outputs (shape mismatch or violated budget) are treated as failure.
Scenarios
Six scenarios are evaluated in parallel:
- ResNet20 on CIFAR-10
- VGG11-BN on CIFAR-10
- MobileNetV2 on CIFAR-10
- ResNet20 on CIFAR-100
- VGG11-BN on CIFAR-100
- MobileNetV2 on CIFAR-100
Baselines
onepixel: one-pixel differential evolution based sparse baseline.sparsefool: gradient-based sparse perturbation baseline.jsma: Jacobian saliency map based targeted sparse baseline.pixle: pixel rearrangement based sparse baseline.
Your goal is to improve ASR while respecting the L0 budget.
torchattacks/bench/custom_attack.py [EDITABLE — lines 3–26 only]
1: import torch
2: import torch.nn as nn
3:
4: # =====================================================================
5: # EDITABLE: implement run_attack below
6: # =====================================================================
7: def run_attack(
8: model: nn.Module,
9: images: torch.Tensor,
10: labels: torch.Tensor,
11: pixels: int,
12: device: torch.device,
13: n_classes: int,
14: ) -> torch.Tensor:
15: """
16: Sparse L0 adversarial attack.
17: images: (N, C, H, W) in [0, 1] on device. labels: (N,) on device.
18: pixels: max number of modified spatial pixels (H, W) per sample.
19: n_classes: 10 for CIFAR-10, 100 for CIFAR-100.
20: Returns adv_images satisfying an L0 pixel budget validated by evaluator.
21: """
22: _ = (model, labels, pixels, device, n_classes)
23: return images.clone()
24:
25: # =======================================================AssistantStep 1edit
Tool Input
1{
2 "op": "replace",
3 "filename": "torchattacks/bench/custom_attack.py",
4 "content": "import torch.nn.functional as F\n\n# =====================================================================\n# EDITABLE: implement run_attack below\n# =====================================================================\ndef run_attack(\n model: nn.Module,\n images: torch.Tensor,\n labels: torch.Tensor,\n pixels: int,\n device: torch.device,\n n_classes: int,\n) -> torch.Tensor:\n \"\"\"\n Sparse L0 adversarial attack based on multi-target support refinement.\n\n The attack alternates between:\n 1) selecting a k-pixel support with a first-order boundary-gain score,\n 2) pushing the selected pixels toward loss-increasing box vertices,\n 3) refining values on the fixed support with masked gradient ascent.\n\n We restart this dynamic support search on several strong target classes and\n keep the adversarial image with the largest untargeted decision margin.\n \"\"\"\n model.eval()\n images = images.to(device)\n labels = labels.to(device)\n adv_images = images.clone()\n batch_size = min(128, images.size(0))\n\n def decision_margin(logits: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n ar = torch.arange(logits.size(0), device=logits.device)\n true_logit = logits[ar, y]\n masked = logits.clone()\n masked[ar, y] = -1e9\n other_logit = masked.max(dim=1).values\n return other_logit - true_logit\n\n def objective(logits: torch.Tensor, y: torch.Tensor, target: torch.Tensor | None) -> torch.Tensor:\n ar = torch.arange(logits.size(0), device=logits.device)\n true_logit = logits[ar, y]\n masked = logits.clone()\n masked[ar, y] = -1e9\n other_logit = masked.max(dim=1).values\n if target is None:\n return torch.logsumexp(masked, dim=1) - true_logit\n logp = F.log_softmax(logits, dim=1)\n return 0.8 * (logp[ar, target] - logp[ar, y]) + 0.2 * (other_logit - true_logit)\n\n def attack_once(x: torch.Tensor, y: torch.Tensor, target: torch.Tensor | None) -> torch.Tensor:\n b, _, h, w = x.shape\n adv = x.clone()\n mask = torch.zeros(b, 1, h, w, dtype=torch.bool, device=x.device)\n momentum = torch.zeros_like(x)\n k = min(pixels, h * w)\n\n for _ in range(4):\n adv.requires_grad_(True)\n logits = model(adv)\n loss = objective(logits, y, target).sum()\n grad = torch.autograd.grad(loss, adv)[0]\n grad = grad / (grad.abs().mean(dim=(1, 2, 3), keepdim=True) + 1e-12)\n momentum = 0.75 * momentum + grad\n\n gain = momentum.clamp(min=0) * (1.0 - adv.detach()) + (-momentum).clamp(min=0) * adv.detach()\n gain = gain.sum(dim=1)\n gain = gain + 0.15 * gain.mean(dim=(1, 2), keepdim=True) * mask.squeeze(1).float()\n flat_gain = gain.flatten(1)\n topk = flat_gain.topk(k, dim=1).indices\n mask = torch.zeros_like(flat_gain, dtype=torch.bool).scatter(1, topk, True).view(b, 1, h, w)\n\n extreme = torch.where(momentum >= 0, torch.ones_like(adv), torch.zeros_like(adv))\n adv = adv.detach()\n adv = torch.where(mask, adv + 0.7 * (extreme - adv), x).clamp(0.0, 1.0)\n\n adv.requires_grad_(True)\n logits = model(adv)\n loss = objective(logits, y, target).sum()\n grad = torch.autograd.grad(loss, adv)[0]\n adv = (adv.detach() + 0.35 * grad.sign() * mask).clamp(0.0, 1.0)\n adv = torch.where(mask, adv, x)\n\n for _ in range(2):\n adv.requires_grad_(True)\n logits = model(adv)\n loss = objective(logits, y, target).sum()\n grad = torch.autograd.grad(loss, adv)[0]\n adv = (adv.detach() + 0.2 * grad.sign() * mask).clamp(0.0, 1.0)\n adv = torch.where(mask, adv, x)\n return adv.detach()\n\n with torch.enable_grad():\n for start in range(0, images.size(0), batch_size):\n end = start + batch_size\n x = images[start:end]\n y = labels[start:end]\n\n with torch.no_grad():\n clean_logits = model(x)\n order = clean_logits.argsort(dim=1, descending=True)\n alt_targets = order[order.ne(y[:, None])].view(x.size(0), -1)\n n_targets = min(3 if n_classes <= 10 else 4, alt_targets.size(1))\n best_margin = decision_margin(clean_logits, y)\n\n best_adv = x.clone()\n success = best_margin > 0\n\n for t in range(n_targets + 1):\n active_idx = (~success).nonzero(as_tuple=False).squeeze(1)\n if active_idx.numel() == 0:\n break\n xa, ya = x[active_idx], y[active_idx]\n target = alt_targets[active_idx, t] if t < n_targets else None\n cand_adv = attack_once(xa, ya, target)\n\n with torch.no_grad():\n cand_logits = model(cand_adv)\n cand_margin = decision_margin(cand_logits, ya)\n\n improve = cand_margin > best_margin[active_idx]\n if improve.any():\n chosen = active_idx[improve]\n best_adv[chosen] = cand_adv[improve]\n best_margin[chosen] = cand_margin[improve]\n success[active_idx] |= cand_margin > 0\n\n adv_images[start:end] = best_adv\n\n return adv_images.clamp(0.0, 1.0)\n\n# =====================================================================\n# END EDITABLE REGION\n# =====================================================================",
5 "after_line": 0,
6 "start_line": 3,
7 "end_line": 26
8}Tool ResultStep 1
OK: Replaced lines 3..26 with 133 line(s) in torchattacks/bench/custom_attack.py. Editable range: 3–135.
[Current file: torchattacks/bench/custom_attack.py | editable: 3–135 | total: 136 lines]
1| import torch
2| import torch.nn as nn
3 import torch.nn.functional as F
4
5 # =====================================================================
6 # EDITABLE: implement run_attack below
7 # =====================================================================
8 def run_attack(
9 model: nn.Module,
10 images: torch.Tensor,
11 labels: torch.Tensor,
12 pixels: int,
13 device: torch.device,
14 n_classes: int,
15 ) -> torch.Tensor:
16 """
17 Sparse L0 adversarial attack based on multi-target support refinement.
18
19 The attack alternates between:
20 1) selecting a k-pixel support with a first-order boundary-gain score,
21 2) pushing t
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=1) to submit this result)
### ResNet20-C10 (scripts/eval_resnet20_cifar10.sh)
Attack results (ResNet20-C10): ASR=0.9700, clean_acc=1.0000, robust_acc=0.0300
### VGG11BN-C10 (scripts/eval_vgg11bn_cifar10.sh)
Attack results (VGG11BN-C10): ASR=0.9900, clean_acc=1.0000, robust_acc=0.0100
### MobileNetV2-C10 (scripts/eval_mobilenetv2_cifar10.sh)
Attack results (MobileNetV2-C10): ASR=0.9600, clean_acc=1.0000, robust_acc=0.0400
### ResNet20-C100 (scripts/eval_resnet20_cifar100.sh)
Attack results (ResNet20-C100): ASR=0.9100, clean_acc=1.0000, robust_acc=0.0900
### VGG11BN-C100 (scripts/eval_vgg11bn_cifar100.sh)
Attack results (VGG11BN-C100): ASR=0.9200, clean_acc=1.0000, robust_acc=0.0800
[Leaderboard] Results saved: {'asr_ResNet20_C10': 0.97, 'elapsed_ResNet20-C10': 7, 'asr_VGG11BN_C10': 0.99, 'elapsed_VGG11BN-C10': 7, 'asr_MobileNetV2_C10': 0.96, 'elapsed_MobileNetV2-C10': 10, 'asr_ResNet20_C100': 0.91, 'elapsed_ResNet20-C100': 10, 'asr_VGG11
AssistantStep 3edit
Tool Input
1{
2 "op": "replace",
3 "filename": "torchattacks/bench/custom_attack.py",
4 "content": "import torch.nn.functional as F\n\n# =====================================================================\n# EDITABLE: implement run_attack below\n# =====================================================================\ndef run_attack(\n model: nn.Module,\n images: torch.Tensor,\n labels: torch.Tensor,\n pixels: int,\n device: torch.device,\n n_classes: int,\n) -> torch.Tensor:\n \"\"\"\n Expand-prune sparse attack with support replacement.\n\n Core idea:\n 1) search over an expanded support (> budget) using box-aware gradient gain,\n 2) prune back to the exact pixel budget using contribution scores,\n 3) refine the support with one-pixel replacement moves,\n 4) repeat across several high-probability target classes and keep the best.\n \"\"\"\n model.eval()\n images = images.to(device)\n labels = labels.to(device)\n adv_images = images.clone()\n batch_size = min(128, images.size(0))\n\n def decision_margin(logits: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n row = torch.arange(logits.size(0), device=logits.device)\n true_logit = logits[row, y]\n masked = logits.clone()\n masked[row, y] = -1e9\n other_logit = masked.max(dim=1).values\n return other_logit - true_logit\n\n def objective(\n logits: torch.Tensor,\n y: torch.Tensor,\n target: torch.Tensor | None,\n ) -> torch.Tensor:\n row = torch.arange(logits.size(0), device=logits.device)\n true_logit = logits[row, y]\n masked = logits.clone()\n masked[row, y] = -1e9\n other_logit = masked.max(dim=1).values\n if target is None:\n return torch.logsumexp(masked, dim=1) - true_logit\n logp = F.log_softmax(logits, dim=1)\n tgt_w = 0.9 if n_classes > 10 else 0.8\n return tgt_w * (logp[row, target] - logp[row, y]) + (1.0 - tgt_w) * (other_logit - true_logit)\n\n def boundary_gain(grad: torch.Tensor, point: torch.Tensor) -> torch.Tensor:\n return (grad.clamp(min=0) * (1.0 - point) + (-grad).clamp(min=0) * point).sum(dim=1)\n\n def choose_best(\n candidates: list[torch.Tensor],\n y: torch.Tensor,\n target: torch.Tensor | None,\n ) -> torch.Tensor:\n if len(candidates) == 1:\n return candidates[0]\n bsz = candidates[0].size(0)\n stacked = torch.cat(candidates, dim=0)\n logits = model(stacked)\n y_rep = y.repeat(len(candidates))\n tgt_rep = None if target is None else target.repeat(len(candidates))\n scores = objective(logits, y_rep, tgt_rep).view(len(candidates), bsz).t()\n best_idx = scores.argmax(dim=1)\n best = candidates[0].clone()\n for i, cand in enumerate(candidates):\n pick = best_idx == i\n if pick.any():\n best[pick] = cand[pick]\n return best\n\n def masked_step(\n x: torch.Tensor,\n adv: torch.Tensor,\n y: torch.Tensor,\n target: torch.Tensor | None,\n mask: torch.Tensor,\n step_size: float,\n ) -> tuple[torch.Tensor, torch.Tensor]:\n adv.requires_grad_(True)\n logits = model(adv)\n loss = objective(logits, y, target).sum()\n grad = torch.autograd.grad(loss, adv)[0]\n grad = grad / (grad.abs().mean(dim=(1, 2, 3), keepdim=True) + 1e-12)\n proposal = (adv.detach() + step_size * grad.sign() * mask).clamp(0.0, 1.0)\n proposal = torch.where(mask, proposal, x)\n extreme = torch.where(grad >= 0, torch.ones_like(adv), torch.zeros_like(adv))\n corner = torch.where(mask, adv.detach() + 0.5 * (extreme - adv.detach()), x).clamp(0.0, 1.0)\n best = choose_best([adv.detach(), proposal, corner], y, target)\n return torch.where(mask, best, x).detach(), grad.detach()\n\n def attack_once(x: torch.Tensor, y: torch.Tensor, target: torch.Tensor | None) -> torch.Tensor:\n bsz, _, h, w = x.shape\n k = min(pixels, h * w)\n k_search = min(h * w, max(k, 2 * k))\n adv = x.clone()\n mask = torch.zeros(bsz, 1, h, w, dtype=torch.bool, device=x.device)\n momentum = torch.zeros_like(x)\n\n for _ in range(3):\n adv.requires_grad_(True)\n logits = model(adv)\n loss = objective(logits, y, target).sum()\n grad = torch.autograd.grad(loss, adv)[0]\n grad = grad / (grad.abs().mean(dim=(1, 2, 3), keepdim=True) + 1e-12)\n momentum = 0.8 * momentum + grad\n\n gain = boundary_gain(momentum, adv.detach()).flatten(1)\n prev = mask.flatten(1).float()\n gain = gain + 0.1 * gain.mean(dim=1, keepdim=True) * prev\n top = gain.topk(k_search, dim=1).indices\n flat_mask = torch.zeros_like(gain, dtype=torch.bool)\n flat_mask.scatter_(1, top, True)\n mask = flat_mask.view(bsz, 1, h, w)\n\n extreme = torch.where(momentum >= 0, torch.ones_like(adv), torch.zeros_like(adv))\n c1 = torch.where(mask, adv.detach() + 0.55 * (extreme - adv.detach()), x).clamp(0.0, 1.0)\n c2 = torch.where(mask, adv.detach() + 0.85 * (extreme - adv.detach()), x).clamp(0.0, 1.0)\n c3 = torch.where(mask, extreme, x)\n adv = choose_best([adv.detach(), c1, c2, c3], y, target)\n adv = torch.where(mask, adv, x)\n adv, step_grad = masked_step(x, adv, y, target, mask, 0.28)\n momentum = 0.7 * momentum + 0.3 * step_grad\n\n adv.requires_grad_(True)\n logits = model(adv)\n loss = objective(logits, y, target).sum()\n grad = torch.autograd.grad(loss, adv)[0]\n grad = grad / (grad.abs().mean(dim=(1, 2, 3), keepdim=True) + 1e-12)\n\n if k_search > k:\n contrib = ((adv.detach() - x).abs() * (0.25 + grad.abs())).sum(dim=1).flatten(1)\n flat_mask = mask.flatten(1)\n contrib = contrib.masked_fill(~flat_mask, -1e9)\n keep = contrib.topk(k, dim=1).indices\n pruned = torch.zeros_like(flat_mask)\n pruned.scatter_(1, keep, True)\n mask = pruned.view(bsz, 1, h, w)\n adv = torch.where(mask, adv.detach(), x)\n adv, grad = masked_step(x, adv, y, target, mask, 0.24)\n\n for _ in range(2):\n flat_mask = mask.flatten(1)\n outside = boundary_gain(grad, adv.detach()).flatten(1).masked_fill(flat_mask, -1e9)\n inside = ((adv.detach() - x).abs() * (0.25 + grad.abs())).sum(dim=1).flatten(1)\n inside = inside.masked_fill(~flat_mask, 1e9)\n best_out = outside.argmax(dim=1, keepdim=True)\n worst_in = inside.argmin(dim=1, keepdim=True)\n row = torch.arange(bsz, device=x.device)[:, None]\n swapped = flat_mask.clone()\n swapped[row, worst_in] = False\n swapped[row, best_out] = True\n mask = swapped.view(bsz, 1, h, w)\n adv = torch.where(mask, adv.detach(), x)\n\n extreme = torch.where(grad >= 0, torch.ones_like(adv), torch.zeros_like(adv))\n moved = torch.where(mask, adv + 0.65 * (extreme - adv), x).clamp(0.0, 1.0)\n adv = choose_best([adv.detach(), moved], y, target)\n adv, grad = masked_step(x, adv, y, target, mask, 0.18)\n\n for step_size in (0.14, 0.10):\n adv, _ = masked_step(x, adv, y, target, mask, step_size)\n return adv.detach()\n\n with torch.enable_grad():\n for start in range(0, images.size(0), batch_size):\n end = start + batch_size\n x = images[start:end]\n y = labels[start:end]\n\n with torch.no_grad():\n clean_logits = model(x)\n order = clean_logits.argsort(dim=1, descending=True)\n alt_targets = order[order.ne(y[:, None])].view(x.size(0), -1)\n n_targets = min(4 if n_classes <= 10 else 6, alt_targets.size(1))\n best_margin = decision_margin(clean_logits, y)\n\n best_adv = x.clone()\n success = best_margin > 0\n\n for t in range(n_targets + 1):\n active = (~success).nonzero(as_tuple=False).squeeze(1)\n if active.numel() == 0:\n break\n xa, ya = x[active], y[active]\n target = alt_targets[active, t] if t < n_targets else None\n cand_adv = attack_once(xa, ya, target)\n\n with torch.no_grad():\n cand_logits = model(cand_adv)\n cand_margin = decision_margin(cand_logits, ya)\n\n improve = cand_margin > best_margin[active]\n if improve.any():\n chosen = active[improve]\n best_adv[chosen] = cand_adv[improve]\n best_margin[chosen] = cand_margin[improve]\n success[active] |= cand_margin > 0\n\n adv_images[start:end] = best_adv\n\n return adv_images.clamp(0.0, 1.0)\n\n# =====================================================================\n# END EDITABLE REGION\n# =====================================================================",
5 "after_line": 0,
6 "start_line": 3,
7 "end_line": 135
8}Tool ResultStep 3
OK: Replaced lines 3..135 with 211 line(s) in torchattacks/bench/custom_attack.py. Editable range: 3–213.
[Current file: torchattacks/bench/custom_attack.py | editable: 3–213 | total: 214 lines]
1| import torch
2| import torch.nn as nn
3 import torch.nn.functional as F
4
5 # =====================================================================
6 # EDITABLE: implement run_attack below
7 # =====================================================================
8 def run_attack(
9 model: nn.Module,
10 images: torch.Tensor,
11 labels: torch.Tensor,
12 pixels: int,
13 device: torch.device,
14 n_classes: int,
15 ) -> torch.Tensor:
16 """
17 Expand-prune sparse attack with support replacement.
18
19 Core idea:
20 1) search over an expanded support (> budget) using box-aware gradient gain,
21 2) prune back to the exact pixel budget usi
AssistantStep 4test
Tool ResultStep 4
[Test #2] (1 test remaining; call submit(n=2) to submit this result)
### ResNet20-C10 (scripts/eval_resnet20_cifar10.sh)
Attack results (ResNet20-C10): ASR=0.9900, clean_acc=1.0000, robust_acc=0.0100
### VGG11BN-C10 (scripts/eval_vgg11bn_cifar10.sh)
Attack results (VGG11BN-C10): ASR=1.0000, clean_acc=1.0000, robust_acc=0.0000
### MobileNetV2-C10 (scripts/eval_mobilenetv2_cifar10.sh)
Attack results (MobileNetV2-C10): ASR=0.9900, clean_acc=1.0000, robust_acc=0.0100
### ResNet20-C100 (scripts/eval_resnet20_cifar100.sh)
Attack results (ResNet20-C100): ASR=0.9700, clean_acc=1.0000, robust_acc=0.0300
### VGG11BN-C100 (scripts/eval_vgg11bn_cifar100.sh)
Attack results (VGG11BN-C100): ASR=0.9600, clean_acc=1.0000, robust_acc=0.0400
[Leaderboard] Results saved: {'asr_ResNet20_C10': 0.99, 'elapsed_ResNet20-C10': 12, 'asr_VGG11BN_C10': 1.0, 'elapsed_VGG11BN-C10': 12, 'asr_MobileNetV2_C10': 0.99, 'elapsed_MobileNetV2-C10': 12, 'asr_ResNet20_C100': 0.97, 'elapsed_ResNet20-C100': 12, 'asr_VGG11
AssistantStep 5submit
Tool Input
1{
2 "n": 2
3}Tool ResultStep 5
[submit] Submitting result from test #2 as final.
### ResNet20-C10 (scripts/eval_resnet20_cifar10.sh)
Attack results (ResNet20-C10): ASR=0.9900, clean_acc=1.0000, robust_acc=0.0100
### VGG11BN-C10 (scripts/eval_vgg11bn_cifar10.sh)
Attack results (VGG11BN-C10): ASR=1.0000, clean_acc=1.0000, robust_acc=0.0000
### MobileNetV2-C10 (scripts/eval_mobilenetv2_cifar10.sh)
Attack results (MobileNetV2-C10): ASR=0.9900, clean_acc=1.0000, robust_acc=0.0100
### ResNet20-C100 (scripts/eval_resnet20_cifar100.sh)
Attack results (ResNet20-C100): ASR=0.9700, clean_acc=1.0000, robust_acc=0.0300
### VGG11BN-C100 (scripts/eval_vgg11bn_cifar100.sh)
Attack results (VGG11BN-C100): ASR=0.9600, clean_acc=1.0000, robust_acc=0.0400
### MobileNetV2-C100 (scripts/eval_mobilenetv2_cifar100.sh)
Attack results (MobileNetV2-C100): ASR=0.9800, clean_acc=1.0000, robust_acc=0.0200
[Leaderboard] Results saved: {'asr_ResNet20_C10': 0.99, 'elapsed_ResNet20-C10': 12, 'asr_VGG11BN_C10': 1.0, 'elapsed_VGG11BN-C10': 12, '