security-adversarial-attack-sparse-l0

Adversarial MLtorchattacksrigorous codebase

Description

Sparse Adversarial Attack (L0 Constraint)

Objective

Implement a stronger sparse attack in bench/custom_attack.py. Your method should maximize attack success rate (ASR) under a strict L0 perturbation budget:

  • Threat model: full model access for custom attack implementation.
  • Norm constraint: number of modified spatial pixels is bounded.
  • Budget: L0(x_adv, x) <= pixels, where pixels = 10.

Editable Interface

You must implement:

run_attack(model, images, labels, pixels, device, n_classes) -> adv_images

Inputs:

  • images: tensor of shape (N, C, H, W), values in [0, 1].
  • labels: tensor of shape (N,).
  • pixels: maximum number of modified spatial pixels per sample.
  • n_classes: 10 for CIFAR-10, 100 for CIFAR-100.

Output:

  • adv_images: same shape as images, also in [0, 1].

Evaluation Protocol

Each evaluation script:

  1. Loads one pretrained model.
  2. Collects up to 1000 samples that are initially classified correctly.
  3. Runs your run_attack.
  4. Checks L0 validity (<= pixels modified spatial pixels).
  5. Reports:
    • clean_acc
    • robust_acc
    • asr = 1 - robust_acc

Important:

  • ASR denominator is the number of initially correct samples.
  • Invalid adversarial outputs (shape mismatch or violated budget) are treated as failure.

Scenarios

Six scenarios are evaluated in parallel:

  • ResNet20 on CIFAR-10
  • VGG11-BN on CIFAR-10
  • MobileNetV2 on CIFAR-10
  • ResNet20 on CIFAR-100
  • VGG11-BN on CIFAR-100
  • MobileNetV2 on CIFAR-100

Baselines

  • onepixel: one-pixel differential evolution based sparse baseline.
  • sparsefool: gradient-based sparse perturbation baseline.
  • jsma: Jacobian saliency map based targeted sparse baseline.
  • pixle: pixel rearrangement based sparse baseline.

Your goal is to improve ASR while respecting the L0 budget.

Code

custom_attack.py
EditableRead-only
1import torch
2import torch.nn as nn
3
4# =====================================================================
5# EDITABLE: implement run_attack below
6# =====================================================================
7def run_attack(
8 model: nn.Module,
9 images: torch.Tensor,
10 labels: torch.Tensor,
11 pixels: int,
12 device: torch.device,
13 n_classes: int,
14) -> torch.Tensor:
15 """
run_eval.py
EditableRead-only
1"""Evaluation harness for sparse L0 adversarial attack task."""
2
3import argparse
4import copy
5import random
6
7import numpy as np
8import torch
9from torch.utils.data import DataLoader, TensorDataset
10from torchvision import datasets, transforms
11
12from custom_attack import run_attack
13
14
15def parse_args() -> argparse.Namespace:

Additional context files (read-only):

  • torchattacks/torchattacks/attacks/_differential_evolution.py

Results

ModelTypeasr ResNet20 C10 asr MobileNetV2 C10 asr ResNet20 C100 asr MobileNetV2 C100 asr VGG11BN C100 asr VGG11BN C10
jsmabaseline0.1400.090---0.160
jsmabaseline0.1400.0900.0000.000-0.160
onepixelbaseline0.8900.8200.8400.870--
onepixelbaseline----0.7600.660
pixlebaseline0.4800.500-0.490--
pixlebaseline--0.590-0.4000.270
sparsefoolbaseline0.3100.3500.3100.2100.3900.410
anthropic/claude-opus-4.6vanilla0.9900.9600.9100.9500.9000.970
google/gemini-3.1-pro-previewvanilla------
gpt-5.4-provanilla0.9700.9600.9100.9500.9200.990
anthropic/claude-opus-4.6agent0.9900.9400.9300.9500.9200.990
google/gemini-3.1-pro-previewagent------
google/gemini-3.1-pro-previewagent1.0001.0000.9900.9800.9701.000
gpt-5.4-proagent0.9900.9900.9700.9800.9601.000

Agent Conversations