security-adversarial-attack-white-box-linf

Adversarial MLtorchattacksrigorous codebase

Description

White-Box Evasion Attack under Linf Constraint

Objective

Implement a stronger white-box L_inf attack in bench/custom_attack.py. Your method should maximize attack success rate (ASR) under a strict perturbation budget:

Threat model: white-box (full model access, including gradients).
Norm constraint: ||x_adv - x||_inf <= eps.
Budget: eps = 4/255.

Editable Interface

You must implement:

run_attack(model, images, labels, eps, device, n_classes) -> adv_images

Inputs:

images: tensor of shape (N, C, H, W), values in [0, 1].
labels: tensor of shape (N,).
n_classes: 10 for CIFAR-10, 100 for CIFAR-100.

Output:

adv_images: same shape as images, also in [0, 1].

Evaluation Protocol

Each evaluation script:

Loads one pretrained model.
Collects up to 1000 samples that are initially classified correctly.
Runs your run_attack.
Checks L_inf validity.
Reports:
- clean_acc
- robust_acc
- asr = 1 - robust_acc

Important:

ASR denominator is the number of initially correct samples.
Invalid adversarial outputs (shape mismatch or violated norm) are treated as failure.

Scenarios

Six scenarios are evaluated in parallel:

ResNet20 on CIFAR-10
VGG11-BN on CIFAR-10
MobileNetV2 on CIFAR-10
ResNet20 on CIFAR-100
VGG11-BN on CIFAR-100
MobileNetV2 on CIFAR-100

Baselines

fgsm: one-step FGSM baseline (simplest first-order attack).
pgd: iterative PGD baseline (strong first-order baseline).
mifgsm: momentum iterative FGSM.
autoattack: torchattacks.AutoAttack(version="standard") as a strong upper baseline.

Your goal is to improve ASR while respecting the Linf budget.

Code

custom_attack.py

EditableRead-only

1import torch
2import torch.nn as nn
3
4# =====================================================================
5# EDITABLE: implement run_attack below
6# =====================================================================
7def run_attack(
8    model: nn.Module,
9    images: torch.Tensor,
10    labels: torch.Tensor,
11    eps: float,
12    device: torch.device,
13    n_classes: int,
14) -> torch.Tensor:
15    """

run_eval.py

EditableRead-only

1"""Evaluation harness for white-box Linf adversarial attack task."""
2
3import argparse
4import copy
5import random
6
7import numpy as np
8import torch
9from torch.utils.data import DataLoader, TensorDataset
10from torchvision import datasets, transforms
11
12from custom_attack import run_attack
13
14
15def parse_args() -> argparse.Namespace:

Results

Model	Type	asr ResNet20 C10 ↑	asr VGG11BN C10 ↑	asr ResNet20 C100 ↑	asr VGG11BN C100 ↑	asr MobileNetV2 C100 ↑
autoattack	baseline	1.000	-	-	-	-
autoattack	baseline	-	-	1.000	0.948	-
autoattack	baseline	1.000	0.935	1.000	0.948	0.999
fgsm	baseline	0.924	0.807	0.877	0.791	0.884
mifgsm	baseline	1.000	0.927	1.000	0.938	-
mifgsm	baseline	-	-	-	-	-
mifgsm	baseline	-	-	-	-	0.999
pgd	baseline	1.000	0.945	1.000	0.950	0.998
pgd	baseline	-	-	-	-	-
pgd	baseline	-	-	-	-	-
anthropic/claude-opus-4.6	vanilla	1.000	0.959	1.000	0.958	0.999
google/gemini-3.1-pro-preview	vanilla	1.000	0.948	1.000	0.940	0.999
gpt-5.4-pro	vanilla	1.000	0.952	1.000	0.954	0.999
anthropic/claude-opus-4.6	agent	1.000	0.963	1.000	0.961	0.999
google/gemini-3.1-pro-preview	agent	1.000	0.956	1.000	0.955	0.999
gpt-5.4-pro	agent	1.000	0.952	1.000	0.954	0.999

Agent Conversations

anthropic/claude-opus-4.6

5 steps

google/gemini-3.1-pro-preview

7 steps

gpt-5.4-pro

5 steps