security-adversarial-attack-black-box-score
Adversarial MLtorchattacksrigorous codebase
Description
Score-Based Query Black-Box Attack under Linf Constraint
Research Question
Can you design a stronger score-based query black-box attack that improves attack success rate (ASR) under a fixed query budget and L_inf perturbation constraint?
Objective
Implement a better black-box attack in bench/custom_attack.py:
- Threat model: query black-box (no gradient access).
- Constraint:
||x_adv - x||_inf <= eps. - Budget:
n_queriesis a per-sample query budget. - Primary metric: maximize
ASRunder fixed budget. - Tie-break: for similar ASR, lower
avg_queriesis better.
Editable Interface
You must implement:
run_attack(model, images, labels, eps, n_queries, device, n_classes) -> adv_images
Inputs:
model: black-box wrapper that returns logits only.images: tensor of shape(N, C, H, W), in[0, 1].labels: tensor of shape(N,).n_classes: 10 for CIFAR-10, 100 for CIFAR-100.
Output:
adv_images: tensor with same shape asimages, values in[0, 1].
Trusted Evaluation Logic
The evaluation logic in bench/run_eval.py is trusted and not editable.
- It tracks all model queries through a wrapper.
- If a batch exceeds query budget (
batch_size * n_queries), the entire batch is marked as attack failure. L_infand[0, 1]validity are checked per sample; only invalid samples are marked as attack failure.
Wrapper behavior and evaluation logic are fixed. Improvements should be confined to the attack algorithm in custom_attack.py.
Query Semantics
- One call to
model(x)consumesx.shape[0]queries. - Repeated calls on the same sample still consume additional queries.
- Different batch partitioning should be treated as equivalent total budget usage.
Evaluation Scenarios (6)
- ResNet20 on CIFAR-10
- VGG11-BN on CIFAR-10
- MobileNetV2 on CIFAR-10
- ResNet20 on CIFAR-100
- VGG11-BN on CIFAR-100
- MobileNetV2 on CIFAR-100
Reported metrics line format:
ATTACK_METRICS asr=... clean_acc=... robust_acc=... avg_queries=...
Code
custom_attack.py
EditableRead-only
1import torch2import torch.nn as nn34# =====================================================================5# EDITABLE: implement run_attack below6# =====================================================================7def run_attack(8model: nn.Module,9images: torch.Tensor,10labels: torch.Tensor,11eps: float,12n_queries: int,13device: torch.device,14n_classes: int,15) -> torch.Tensor:
run_eval.py
EditableRead-only
1"""Trusted evaluation harness for score-based query black-box attack task."""23import argparse4import random56import numpy as np7import torch8from torch.utils.data import DataLoader, TensorDataset9from torchvision import datasets, transforms1011from custom_attack import run_attack121314class QueryLimitedBlackBox(torch.nn.Module):15"""Query-limited wrapper with no gradient path and budget tracking."""
Results
| Model | Type | asr ResNet20 C10 ↑ | avg queries ResNet20 C10 ↓ | asr MobileNetV2 C10 ↑ | avg queries MobileNetV2 C10 ↓ | asr ResNet20 C100 ↑ | avg queries ResNet20 C100 ↓ | asr MobileNetV2 C100 ↑ | avg queries MobileNetV2 C100 ↓ | asr VGG11BN C10 ↑ | avg queries VGG11BN C10 ↓ |
|---|---|---|---|---|---|---|---|---|---|---|---|
| random_search | baseline | 0.565 | 65.000 | 0.520 | 65.000 | 0.615 | 65.000 | 0.590 | 65.000 | - | - |
| random_search | baseline | - | - | - | - | - | - | - | - | 0.240 | 65.000 |
| spsa | baseline | 0.955 | 768.000 | - | - | - | - | - | - | - | - |
| spsa | baseline | - | - | 0.920 | 768.000 | 0.910 | 768.000 | 0.885 | 768.000 | 0.640 | 768.000 |
| square | baseline | 0.995 | 121.000 | - | - | 0.985 | 72.500 | 0.990 | 86.190 | 0.910 | 212.340 |
| square | baseline | - | - | 0.975 | 139.210 | - | - | - | - | - | - |
| square | baseline | 0.995 | 121.000 | - | - | 0.985 | 72.500 | 0.990 | 86.190 | 0.910 | 212.340 |
| anthropic/claude-opus-4.6 | vanilla | 1.000 | 118.390 | 1.000 | 146.320 | 1.000 | 84.420 | 1.000 | 106.450 | 0.975 | 811.720 |
| google/gemini-3.1-pro-preview | vanilla | 1.000 | 53.150 | 1.000 | 87.440 | 1.000 | 61.050 | 1.000 | 83.910 | 0.940 | 933.380 |
| gpt-5.4-pro | vanilla | 0.995 | 279.320 | 0.990 | 251.030 | 0.990 | 309.090 | 1.000 | 315.850 | 0.910 | 560.620 |
| anthropic/claude-opus-4.6 | agent | 1.000 | 118.390 | 1.000 | 146.320 | 1.000 | 84.420 | 1.000 | 106.450 | 0.975 | 811.720 |
| google/gemini-3.1-pro-preview | agent | 1.000 | 40.910 | 0.990 | 52.210 | 1.000 | 38.260 | 0.995 | 48.750 | 0.990 | 465.650 |
| gpt-5.4-pro | agent | 0.995 | 279.320 | 0.990 | 251.030 | 0.990 | 309.090 | 1.000 | 315.850 | 0.910 | 560.620 |