security-adversarial-attack-white-box-linf

Adversarial MLtorchattacksrigorous codebase

Description

White-Box Evasion Attack under Linf Constraint

Objective

Implement a stronger white-box L_inf attack in bench/custom_attack.py. Your method should maximize attack success rate (ASR) under a strict perturbation budget:

  • Threat model: white-box (full model access, including gradients).
  • Norm constraint: ||x_adv - x||_inf <= eps.
  • Budget: eps = 4/255.

Editable Interface

You must implement:

run_attack(model, images, labels, eps, device, n_classes) -> adv_images

Inputs:

  • images: tensor of shape (N, C, H, W), values in [0, 1].
  • labels: tensor of shape (N,).
  • n_classes: 10 for CIFAR-10, 100 for CIFAR-100.

Output:

  • adv_images: same shape as images, also in [0, 1].

Evaluation Protocol

Each evaluation script:

  1. Loads one pretrained model.
  2. Collects up to 1000 samples that are initially classified correctly.
  3. Runs your run_attack.
  4. Checks L_inf validity.
  5. Reports:
    • clean_acc
    • robust_acc
    • asr = 1 - robust_acc

Important:

  • ASR denominator is the number of initially correct samples.
  • Invalid adversarial outputs (shape mismatch or violated norm) are treated as failure.

Scenarios

Six scenarios are evaluated in parallel:

  • ResNet20 on CIFAR-10
  • VGG11-BN on CIFAR-10
  • MobileNetV2 on CIFAR-10
  • ResNet20 on CIFAR-100
  • VGG11-BN on CIFAR-100
  • MobileNetV2 on CIFAR-100

Baselines

  • fgsm: one-step FGSM baseline (simplest first-order attack).
  • pgd: iterative PGD baseline (strong first-order baseline).
  • mifgsm: momentum iterative FGSM.
  • autoattack: torchattacks.AutoAttack(version="standard") as a strong upper baseline.

Your goal is to improve ASR while respecting the Linf budget.

Code

custom_attack.py
EditableRead-only
1import torch
2import torch.nn as nn
3
4# =====================================================================
5# EDITABLE: implement run_attack below
6# =====================================================================
7def run_attack(
8 model: nn.Module,
9 images: torch.Tensor,
10 labels: torch.Tensor,
11 eps: float,
12 device: torch.device,
13 n_classes: int,
14) -> torch.Tensor:
15 """
run_eval.py
EditableRead-only
1"""Evaluation harness for white-box Linf adversarial attack task."""
2
3import argparse
4import copy
5import random
6
7import numpy as np
8import torch
9from torch.utils.data import DataLoader, TensorDataset
10from torchvision import datasets, transforms
11
12from custom_attack import run_attack
13
14
15def parse_args() -> argparse.Namespace:

Results

ModelTypeasr ResNet20 C10 asr VGG11BN C10 asr ResNet20 C100 asr VGG11BN C100 asr MobileNetV2 C100
autoattackbaseline1.000----
autoattackbaseline--1.0000.948-
autoattackbaseline1.0000.9351.0000.9480.999
fgsmbaseline0.9240.8070.8770.7910.884
mifgsmbaseline1.0000.9271.0000.938-
mifgsmbaseline-----
mifgsmbaseline----0.999
pgdbaseline1.0000.9451.0000.9500.998
pgdbaseline-----
pgdbaseline-----
anthropic/claude-opus-4.6vanilla1.0000.9591.0000.9580.999
google/gemini-3.1-pro-previewvanilla1.0000.9481.0000.9400.999
gpt-5.4-provanilla1.0000.9521.0000.9540.999
anthropic/claude-opus-4.6agent1.0000.9631.0000.9610.999
google/gemini-3.1-pro-previewagent1.0000.9561.0000.9550.999
gpt-5.4-proagent1.0000.9521.0000.9540.999

Agent Conversations