security-adversarial-attack-white-box-linf
Adversarial MLtorchattacksrigorous codebase
Description
White-Box Evasion Attack under Linf Constraint
Objective
Implement a stronger white-box L_inf attack in bench/custom_attack.py.
Your method should maximize attack success rate (ASR) under a strict perturbation budget:
- Threat model: white-box (full model access, including gradients).
- Norm constraint:
||x_adv - x||_inf <= eps. - Budget:
eps = 4/255.
Editable Interface
You must implement:
run_attack(model, images, labels, eps, device, n_classes) -> adv_images
Inputs:
images: tensor of shape(N, C, H, W), values in[0, 1].labels: tensor of shape(N,).n_classes: 10 for CIFAR-10, 100 for CIFAR-100.
Output:
adv_images: same shape asimages, also in[0, 1].
Evaluation Protocol
Each evaluation script:
- Loads one pretrained model.
- Collects up to 1000 samples that are initially classified correctly.
- Runs your
run_attack. - Checks
L_infvalidity. - Reports:
clean_accrobust_accasr = 1 - robust_acc
Important:
- ASR denominator is the number of initially correct samples.
- Invalid adversarial outputs (shape mismatch or violated norm) are treated as failure.
Scenarios
Six scenarios are evaluated in parallel:
- ResNet20 on CIFAR-10
- VGG11-BN on CIFAR-10
- MobileNetV2 on CIFAR-10
- ResNet20 on CIFAR-100
- VGG11-BN on CIFAR-100
- MobileNetV2 on CIFAR-100
Baselines
fgsm: one-step FGSM baseline (simplest first-order attack).pgd: iterative PGD baseline (strong first-order baseline).mifgsm: momentum iterative FGSM.autoattack:torchattacks.AutoAttack(version="standard")as a strong upper baseline.
Your goal is to improve ASR while respecting the Linf budget.
Code
custom_attack.py
EditableRead-only
1import torch2import torch.nn as nn34# =====================================================================5# EDITABLE: implement run_attack below6# =====================================================================7def run_attack(8model: nn.Module,9images: torch.Tensor,10labels: torch.Tensor,11eps: float,12device: torch.device,13n_classes: int,14) -> torch.Tensor:15"""
run_eval.py
EditableRead-only
1"""Evaluation harness for white-box Linf adversarial attack task."""23import argparse4import copy5import random67import numpy as np8import torch9from torch.utils.data import DataLoader, TensorDataset10from torchvision import datasets, transforms1112from custom_attack import run_attack131415def parse_args() -> argparse.Namespace:
Results
| Model | Type | asr ResNet20 C10 ↑ | asr VGG11BN C10 ↑ | asr ResNet20 C100 ↑ | asr VGG11BN C100 ↑ | asr MobileNetV2 C100 ↑ |
|---|---|---|---|---|---|---|
| autoattack | baseline | 1.000 | - | - | - | - |
| autoattack | baseline | - | - | 1.000 | 0.948 | - |
| autoattack | baseline | 1.000 | 0.935 | 1.000 | 0.948 | 0.999 |
| fgsm | baseline | 0.924 | 0.807 | 0.877 | 0.791 | 0.884 |
| mifgsm | baseline | 1.000 | 0.927 | 1.000 | 0.938 | - |
| mifgsm | baseline | - | - | - | - | - |
| mifgsm | baseline | - | - | - | - | 0.999 |
| pgd | baseline | 1.000 | 0.945 | 1.000 | 0.950 | 0.998 |
| pgd | baseline | - | - | - | - | - |
| pgd | baseline | - | - | - | - | - |
| anthropic/claude-opus-4.6 | vanilla | 1.000 | 0.959 | 1.000 | 0.958 | 0.999 |
| google/gemini-3.1-pro-preview | vanilla | 1.000 | 0.948 | 1.000 | 0.940 | 0.999 |
| gpt-5.4-pro | vanilla | 1.000 | 0.952 | 1.000 | 0.954 | 0.999 |
| anthropic/claude-opus-4.6 | agent | 1.000 | 0.963 | 1.000 | 0.961 | 0.999 |
| google/gemini-3.1-pro-preview | agent | 1.000 | 0.956 | 1.000 | 0.955 | 0.999 |
| gpt-5.4-pro | agent | 1.000 | 0.952 | 1.000 | 0.954 | 0.999 |