security-adversarial-attack-sparse-l0

Adversarial MLtorchattacksrigorous codebase

Description

Sparse Adversarial Attack (L0 Constraint)

Objective

Implement a stronger sparse attack in bench/custom_attack.py. Your method should maximize attack success rate (ASR) under a strict L0 perturbation budget:

Threat model: full model access for custom attack implementation.
Norm constraint: number of modified spatial pixels is bounded.
Budget: L0(x_adv, x) <= pixels, where pixels = 10.

Editable Interface

You must implement:

run_attack(model, images, labels, pixels, device, n_classes) -> adv_images

Inputs:

images: tensor of shape (N, C, H, W), values in [0, 1].
labels: tensor of shape (N,).
pixels: maximum number of modified spatial pixels per sample.
n_classes: 10 for CIFAR-10, 100 for CIFAR-100.

Output:

adv_images: same shape as images, also in [0, 1].

Evaluation Protocol

Each evaluation script:

Loads one pretrained model.
Collects up to 1000 samples that are initially classified correctly.
Runs your run_attack.
Checks L0 validity (<= pixels modified spatial pixels).
Reports:
- clean_acc
- robust_acc
- asr = 1 - robust_acc

Important:

ASR denominator is the number of initially correct samples.
Invalid adversarial outputs (shape mismatch or violated budget) are treated as failure.

Scenarios

Six scenarios are evaluated in parallel:

ResNet20 on CIFAR-10
VGG11-BN on CIFAR-10
MobileNetV2 on CIFAR-10
ResNet20 on CIFAR-100
VGG11-BN on CIFAR-100
MobileNetV2 on CIFAR-100

Baselines

onepixel: one-pixel differential evolution based sparse baseline.
sparsefool: gradient-based sparse perturbation baseline.
jsma: Jacobian saliency map based targeted sparse baseline.
pixle: pixel rearrangement based sparse baseline.

Your goal is to improve ASR while respecting the L0 budget.

Code

custom_attack.py

EditableRead-only

1import torch
2import torch.nn as nn
3
4# =====================================================================
5# EDITABLE: implement run_attack below
6# =====================================================================
7def run_attack(
8    model: nn.Module,
9    images: torch.Tensor,
10    labels: torch.Tensor,
11    pixels: int,
12    device: torch.device,
13    n_classes: int,
14) -> torch.Tensor:
15    """

run_eval.py

EditableRead-only

1"""Evaluation harness for sparse L0 adversarial attack task."""
2
3import argparse
4import copy
5import random
6
7import numpy as np
8import torch
9from torch.utils.data import DataLoader, TensorDataset
10from torchvision import datasets, transforms
11
12from custom_attack import run_attack
13
14
15def parse_args() -> argparse.Namespace:

Additional context files (read-only):

torchattacks/torchattacks/attacks/_differential_evolution.py

Results

Model	Type	asr ResNet20 C10 ↑	asr MobileNetV2 C10 ↑	asr ResNet20 C100 ↑	asr MobileNetV2 C100 ↑	asr VGG11BN C100 ↑	asr VGG11BN C10 ↑
jsma	baseline	0.140	0.090	-	-	-	0.160
jsma	baseline	0.140	0.090	0.000	0.000	-	0.160
onepixel	baseline	0.890	0.820	0.840	0.870	-	-
onepixel	baseline	-	-	-	-	0.760	0.660
pixle	baseline	0.480	0.500	-	0.490	-	-
pixle	baseline	-	-	0.590	-	0.400	0.270
sparsefool	baseline	0.310	0.350	0.310	0.210	0.390	0.410
anthropic/claude-opus-4.6	vanilla	0.990	0.960	0.910	0.950	0.900	0.970
google/gemini-3.1-pro-preview	vanilla	-	-	-	-	-	-
gpt-5.4-pro	vanilla	0.970	0.960	0.910	0.950	0.920	0.990
anthropic/claude-opus-4.6	agent	0.990	0.940	0.930	0.950	0.920	0.990
google/gemini-3.1-pro-preview	agent	-	-	-	-	-	-
google/gemini-3.1-pro-preview	agent	1.000	1.000	0.990	0.980	0.970	1.000
gpt-5.4-pro	agent	0.990	0.990	0.970	0.980	0.960	1.000

Agent Conversations

anthropic/claude-opus-4.6

5 steps

google/gemini-3.1-pro-preview

7 steps

gpt-5.4-pro

5 steps