security-membership-inference-defense

Adversarial MLpytorch-visionrigorous codebase

Description

Membership Inference Defense via Training Regularization

Research Question

How can we design a stronger training-time regularizer that reduces membership inference leakage while preserving standard predictive accuracy?

Background

Membership inference attacks exploit the gap between train and non-train examples, often through confidence or loss statistics. Many defenses regularize predictions to reduce overconfidence and shrink the train-test generalization gap, but stronger privacy often hurts utility.

Task

Implement a better privacy-preserving training objective in bench/membership/custom_membership_defense.py. The fixed harness will train a model on a 50/50 train/non-train split of the full dataset, then run a confidence-based membership inference attack on train versus held-out examples.

Your method should improve the privacy-utility tradeoff: lower membership attack AUC while retaining high test accuracy.

Editable Interface

You must implement:

class MembershipDefense:
    def compute_loss(self, logits, labels, epoch):
        ...

logits: model outputs for the current minibatch
labels: ground-truth labels
epoch: current training epoch (0-indexed)
Return value: scalar loss tensor used by the fixed training loop

The optimizer (SGD + CosineAnnealing), architecture, data pipeline, and attack implementation are fixed.

Evaluation

Benchmarks:

resnet20-cifar10: ResNet-20 on CIFAR-10
vgg16bn-cifar100: VGG-16-BN on CIFAR-100
mobilenetv2-fmnist: MobileNetV2 on FashionMNIST

Reported metrics:

test_acc
mia_auc
privacy_gap
privacy_score

Primary metric: privacy_score (higher is better), defined as test_acc - max(mia_auc - 0.5, 0).

Baselines

erm: standard cross-entropy training
label_smoothing: smoothed targets to reduce overconfidence
confidence_penalty: cross-entropy plus predictive entropy penalty
relaxloss: margin-aware loss relaxation as a stronger privacy baseline

Code

custom_membership_defense.py

EditableRead-only

1"""Editable membership-inference defense for MLS-Bench."""
2
3import torch
4import torch.nn.functional as F
5
6# ============================================================
7# EDITABLE
8# ============================================================
9class MembershipDefense:
10    """Training-time regularizer for privacy-utility tradeoffs.
11
12    The compute_loss method replaces nn.CrossEntropyLoss() in the
13    fixed training loop.  Design a loss that reduces membership
14    inference leakage (lower MIA AUC) while preserving test accuracy.
15

run_membership_defense.py

EditableRead-only

1"""Fixed evaluation harness for security-membership-inference-defense.
2
3Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST
4to evaluate custom membership-inference defense losses.
5
6FIXED: Model architectures, data pipeline, training loop, MIA evaluation.
7EDITABLE: MembershipDefense.compute_loss() method.
8
9Usage:
10    python run_membership_defense.py --arch resnet20 --dataset cifar10 --seed 42
11    python run_membership_defense.py --arch mobilenetv2 --dataset fmnist --seed 42
12"""
13
14import argparse
15import os

Results

Show per-seed results

Model	Type	test acc resnet20 cifar10 ↑	mia auc resnet20 cifar10 ↓	privacy gap resnet20 cifar10 ↓	privacy score resnet20 cifar10 ↑	test acc vgg16bn cifar100 ↑	mia auc vgg16bn cifar100 ↓	privacy gap vgg16bn cifar100 ↓	privacy score vgg16bn cifar100 ↑	test acc mobilenetv2 fmnist ↑	mia auc mobilenetv2 fmnist ↓	privacy gap mobilenetv2 fmnist ↓	privacy score mobilenetv2 fmnist ↑
confidence_penalty	baseline	0.884	0.570	0.051	0.814	0.570	0.640	0.127	0.430	0.943	0.517	0.012	0.927
erm	baseline	0.891	0.569	0.045	0.822	0.629	0.715	0.131	0.414	0.943	0.516	0.011	0.927
label_smoothing	baseline	0.892	0.603	0.057	0.789	0.458	0.550	0.036	0.408	0.940	0.520	0.010	0.920
relaxloss	baseline	0.888	0.600	0.034	0.788	0.402	0.549	0.011	0.353	0.944	0.527	0.006	0.918
relaxloss	baseline	0.104	0.786	0.000	-0.182	0.010	0.001	0.000	0.010	0.100	0.001	0.000	0.100
anthropic/claude-opus-4.6	vanilla	0.894	0.605	0.059	0.790	0.010	0.003	0.000	0.010	0.943	0.524	0.012	0.919
deepseek-reasoner	vanilla	0.890	0.565	0.052	0.825	0.587	0.666	0.122	0.420	0.942	0.515	0.012	0.927
google/gemini-3.1-pro-preview	vanilla	0.887	0.549	0.013	0.838	0.646	0.697	0.010	0.449	0.941	0.514	0.003	0.928
openai/gpt-5.4-pro	vanilla	0.894	0.595	0.046	0.798	0.632	0.733	0.136	0.399	0.943	0.519	0.007	0.924
qwen3.6-plus:free	vanilla	0.894	0.605	0.050	0.790	0.622	0.729	0.166	0.394	0.942	0.523	0.009	0.919
anthropic/claude-opus-4.6	agent	0.884	0.580	0.022	0.804	0.578	0.663	0.125	0.414	0.936	0.515	0.002	0.921
deepseek-reasoner	agent	0.887	0.600	0.043	0.788	0.010	0.003	0.000	0.010	0.942	0.526	0.009	0.916
google/gemini-3.1-pro-preview	agent	0.891	0.544	0.016	0.847	0.645	0.689	0.024	0.456	0.938	0.513	0.002	0.925
openai/gpt-5.4-pro	agent	0.894	0.595	0.046	0.798	0.632	0.733	0.136	0.399	0.943	0.519	0.007	0.924
qwen3.6-plus:free	agent	0.887	0.582	0.062	0.805	-	-	-	-	-	-	-	-

Agent Conversations

anthropic/claude-opus-4.6

6 steps

deepseek-reasoner

7 steps

google/gemini-3.1-pro-preview

6 steps

openai/gpt-5.4-pro

5 steps