security-adversarial-attack-sparse-l0
Adversarial MLtorchattacksrigorous codebase
Description
Sparse Adversarial Attack (L0 Constraint)
Objective
Implement a stronger sparse attack in bench/custom_attack.py.
Your method should maximize attack success rate (ASR) under a strict L0 perturbation budget:
- Threat model: full model access for custom attack implementation.
- Norm constraint: number of modified spatial pixels is bounded.
- Budget:
L0(x_adv, x) <= pixels, wherepixels = 10.
Editable Interface
You must implement:
run_attack(model, images, labels, pixels, device, n_classes) -> adv_images
Inputs:
images: tensor of shape(N, C, H, W), values in[0, 1].labels: tensor of shape(N,).pixels: maximum number of modified spatial pixels per sample.n_classes: 10 for CIFAR-10, 100 for CIFAR-100.
Output:
adv_images: same shape asimages, also in[0, 1].
Evaluation Protocol
Each evaluation script:
- Loads one pretrained model.
- Collects up to 1000 samples that are initially classified correctly.
- Runs your
run_attack. - Checks
L0validity (<= pixelsmodified spatial pixels). - Reports:
clean_accrobust_accasr = 1 - robust_acc
Important:
- ASR denominator is the number of initially correct samples.
- Invalid adversarial outputs (shape mismatch or violated budget) are treated as failure.
Scenarios
Six scenarios are evaluated in parallel:
- ResNet20 on CIFAR-10
- VGG11-BN on CIFAR-10
- MobileNetV2 on CIFAR-10
- ResNet20 on CIFAR-100
- VGG11-BN on CIFAR-100
- MobileNetV2 on CIFAR-100
Baselines
onepixel: one-pixel differential evolution based sparse baseline.sparsefool: gradient-based sparse perturbation baseline.jsma: Jacobian saliency map based targeted sparse baseline.pixle: pixel rearrangement based sparse baseline.
Your goal is to improve ASR while respecting the L0 budget.
Code
custom_attack.py
EditableRead-only
1import torch2import torch.nn as nn34# =====================================================================5# EDITABLE: implement run_attack below6# =====================================================================7def run_attack(8model: nn.Module,9images: torch.Tensor,10labels: torch.Tensor,11pixels: int,12device: torch.device,13n_classes: int,14) -> torch.Tensor:15"""
run_eval.py
EditableRead-only
1"""Evaluation harness for sparse L0 adversarial attack task."""23import argparse4import copy5import random67import numpy as np8import torch9from torch.utils.data import DataLoader, TensorDataset10from torchvision import datasets, transforms1112from custom_attack import run_attack131415def parse_args() -> argparse.Namespace:
Additional context files (read-only):
torchattacks/torchattacks/attacks/_differential_evolution.py
Results
| Model | Type | asr ResNet20 C10 ↑ | asr MobileNetV2 C10 ↑ | asr ResNet20 C100 ↑ | asr MobileNetV2 C100 ↑ | asr VGG11BN C100 ↑ | asr VGG11BN C10 ↑ |
|---|---|---|---|---|---|---|---|
| jsma | baseline | 0.140 | 0.090 | - | - | - | 0.160 |
| jsma | baseline | 0.140 | 0.090 | 0.000 | 0.000 | - | 0.160 |
| onepixel | baseline | 0.890 | 0.820 | 0.840 | 0.870 | - | - |
| onepixel | baseline | - | - | - | - | 0.760 | 0.660 |
| pixle | baseline | 0.480 | 0.500 | - | 0.490 | - | - |
| pixle | baseline | - | - | 0.590 | - | 0.400 | 0.270 |
| sparsefool | baseline | 0.310 | 0.350 | 0.310 | 0.210 | 0.390 | 0.410 |
| anthropic/claude-opus-4.6 | vanilla | 0.990 | 0.960 | 0.910 | 0.950 | 0.900 | 0.970 |
| google/gemini-3.1-pro-preview | vanilla | - | - | - | - | - | - |
| gpt-5.4-pro | vanilla | 0.970 | 0.960 | 0.910 | 0.950 | 0.920 | 0.990 |
| anthropic/claude-opus-4.6 | agent | 0.990 | 0.940 | 0.930 | 0.950 | 0.920 | 0.990 |
| google/gemini-3.1-pro-preview | agent | - | - | - | - | - | - |
| google/gemini-3.1-pro-preview | agent | 1.000 | 1.000 | 0.990 | 0.980 | 0.970 | 1.000 |
| gpt-5.4-pro | agent | 0.990 | 0.990 | 0.970 | 0.980 | 0.960 | 1.000 |