security-adversarial-attack-black-box-score

Adversarial MLtorchattacksrigorous codebase

Description

Score-Based Query Black-Box Attack under Linf Constraint

Research Question

Can you design a stronger score-based query black-box attack that improves attack success rate (ASR) under a fixed query budget and L_inf perturbation constraint?

Objective

Implement a better black-box attack in bench/custom_attack.py:

  • Threat model: query black-box (no gradient access).
  • Constraint: ||x_adv - x||_inf <= eps.
  • Budget: n_queries is a per-sample query budget.
  • Primary metric: maximize ASR under fixed budget.
  • Tie-break: for similar ASR, lower avg_queries is better.

Editable Interface

You must implement:

run_attack(model, images, labels, eps, n_queries, device, n_classes) -> adv_images

Inputs:

  • model: black-box wrapper that returns logits only.
  • images: tensor of shape (N, C, H, W), in [0, 1].
  • labels: tensor of shape (N,).
  • n_classes: 10 for CIFAR-10, 100 for CIFAR-100.

Output:

  • adv_images: tensor with same shape as images, values in [0, 1].

Trusted Evaluation Logic

The evaluation logic in bench/run_eval.py is trusted and not editable.

  • It tracks all model queries through a wrapper.
  • If a batch exceeds query budget (batch_size * n_queries), the entire batch is marked as attack failure.
  • L_inf and [0, 1] validity are checked per sample; only invalid samples are marked as attack failure.

Wrapper behavior and evaluation logic are fixed. Improvements should be confined to the attack algorithm in custom_attack.py.

Query Semantics

  • One call to model(x) consumes x.shape[0] queries.
  • Repeated calls on the same sample still consume additional queries.
  • Different batch partitioning should be treated as equivalent total budget usage.

Evaluation Scenarios (6)

  • ResNet20 on CIFAR-10
  • VGG11-BN on CIFAR-10
  • MobileNetV2 on CIFAR-10
  • ResNet20 on CIFAR-100
  • VGG11-BN on CIFAR-100
  • MobileNetV2 on CIFAR-100

Reported metrics line format:

ATTACK_METRICS asr=... clean_acc=... robust_acc=... avg_queries=...

Code

custom_attack.py
EditableRead-only
1import torch
2import torch.nn as nn
3
4# =====================================================================
5# EDITABLE: implement run_attack below
6# =====================================================================
7def run_attack(
8 model: nn.Module,
9 images: torch.Tensor,
10 labels: torch.Tensor,
11 eps: float,
12 n_queries: int,
13 device: torch.device,
14 n_classes: int,
15) -> torch.Tensor:
run_eval.py
EditableRead-only
1"""Trusted evaluation harness for score-based query black-box attack task."""
2
3import argparse
4import random
5
6import numpy as np
7import torch
8from torch.utils.data import DataLoader, TensorDataset
9from torchvision import datasets, transforms
10
11from custom_attack import run_attack
12
13
14class QueryLimitedBlackBox(torch.nn.Module):
15 """Query-limited wrapper with no gradient path and budget tracking."""

Results

ModelTypeasr ResNet20 C10 avg queries ResNet20 C10 asr MobileNetV2 C10 avg queries MobileNetV2 C10 asr ResNet20 C100 avg queries ResNet20 C100 asr MobileNetV2 C100 avg queries MobileNetV2 C100 asr VGG11BN C10 avg queries VGG11BN C10
random_searchbaseline0.56565.0000.52065.0000.61565.0000.59065.000--
random_searchbaseline--------0.24065.000
spsabaseline0.955768.000--------
spsabaseline--0.920768.0000.910768.0000.885768.0000.640768.000
squarebaseline0.995121.000--0.98572.5000.99086.1900.910212.340
squarebaseline--0.975139.210------
squarebaseline0.995121.000--0.98572.5000.99086.1900.910212.340
anthropic/claude-opus-4.6vanilla1.000118.3901.000146.3201.00084.4201.000106.4500.975811.720
google/gemini-3.1-pro-previewvanilla1.00053.1501.00087.4401.00061.0501.00083.9100.940933.380
gpt-5.4-provanilla0.995279.3200.990251.0300.990309.0901.000315.8500.910560.620
anthropic/claude-opus-4.6agent1.000118.3901.000146.3201.00084.4201.000106.4500.975811.720
google/gemini-3.1-pro-previewagent1.00040.9100.99052.2101.00038.2600.99548.7500.990465.650
gpt-5.4-proagent0.995279.3200.990251.0300.990309.0901.000315.8500.910560.620

Agent Conversations