security-membership-inference-defense
Description
Membership Inference Defense via Training Regularization
Research Question
How can we design a stronger training-time regularizer that reduces membership inference leakage while preserving standard predictive accuracy?
Background
Membership inference attacks exploit the gap between train and non-train examples, often through confidence or loss statistics. Many defenses regularize predictions to reduce overconfidence and shrink the train-test generalization gap, but stronger privacy often hurts utility.
Task
Implement a better privacy-preserving training objective in bench/membership/custom_membership_defense.py. The fixed harness will train a model on a 50/50 train/non-train split of the full dataset, then run a confidence-based membership inference attack on train versus held-out examples.
Your method should improve the privacy-utility tradeoff: lower membership attack AUC while retaining high test accuracy.
Editable Interface
You must implement:
class MembershipDefense:
def compute_loss(self, logits, labels, epoch):
...
logits: model outputs for the current minibatchlabels: ground-truth labelsepoch: current training epoch (0-indexed)- Return value: scalar loss tensor used by the fixed training loop
The optimizer (SGD + CosineAnnealing), architecture, data pipeline, and attack implementation are fixed.
Evaluation
Benchmarks:
resnet20-cifar10: ResNet-20 on CIFAR-10vgg16bn-cifar100: VGG-16-BN on CIFAR-100mobilenetv2-fmnist: MobileNetV2 on FashionMNIST
Reported metrics:
test_accmia_aucprivacy_gapprivacy_score
Primary metric: privacy_score (higher is better), defined as test_acc - max(mia_auc - 0.5, 0).
Baselines
erm: standard cross-entropy traininglabel_smoothing: smoothed targets to reduce overconfidenceconfidence_penalty: cross-entropy plus predictive entropy penaltyrelaxloss: margin-aware loss relaxation as a stronger privacy baseline
Code
1"""Editable membership-inference defense for MLS-Bench."""23import torch4import torch.nn.functional as F56# ============================================================7# EDITABLE8# ============================================================9class MembershipDefense:10"""Training-time regularizer for privacy-utility tradeoffs.1112The compute_loss method replaces nn.CrossEntropyLoss() in the13fixed training loop. Design a loss that reduces membership14inference leakage (lower MIA AUC) while preserving test accuracy.15
1"""Fixed evaluation harness for security-membership-inference-defense.23Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST4to evaluate custom membership-inference defense losses.56FIXED: Model architectures, data pipeline, training loop, MIA evaluation.7EDITABLE: MembershipDefense.compute_loss() method.89Usage:10python run_membership_defense.py --arch resnet20 --dataset cifar10 --seed 4211python run_membership_defense.py --arch mobilenetv2 --dataset fmnist --seed 4212"""1314import argparse15import os
Results
| Model | Type | test acc resnet20 cifar10 ↑ | mia auc resnet20 cifar10 ↓ | privacy gap resnet20 cifar10 ↓ | privacy score resnet20 cifar10 ↑ | test acc vgg16bn cifar100 ↑ | mia auc vgg16bn cifar100 ↓ | privacy gap vgg16bn cifar100 ↓ | privacy score vgg16bn cifar100 ↑ | test acc mobilenetv2 fmnist ↑ | mia auc mobilenetv2 fmnist ↓ | privacy gap mobilenetv2 fmnist ↓ | privacy score mobilenetv2 fmnist ↑ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| confidence_penalty | baseline | 0.884 | 0.570 | 0.051 | 0.814 | 0.570 | 0.640 | 0.127 | 0.430 | 0.943 | 0.517 | 0.012 | 0.927 |
| erm | baseline | 0.891 | 0.569 | 0.045 | 0.822 | 0.629 | 0.715 | 0.131 | 0.414 | 0.943 | 0.516 | 0.011 | 0.927 |
| label_smoothing | baseline | 0.892 | 0.603 | 0.057 | 0.789 | 0.458 | 0.550 | 0.036 | 0.408 | 0.940 | 0.520 | 0.010 | 0.920 |
| relaxloss | baseline | 0.888 | 0.600 | 0.034 | 0.788 | 0.402 | 0.549 | 0.011 | 0.353 | 0.944 | 0.527 | 0.006 | 0.918 |
| relaxloss | baseline | 0.104 | 0.786 | 0.000 | -0.182 | 0.010 | 0.001 | 0.000 | 0.010 | 0.100 | 0.001 | 0.000 | 0.100 |
| anthropic/claude-opus-4.6 | vanilla | 0.894 | 0.605 | 0.059 | 0.790 | 0.010 | 0.003 | 0.000 | 0.010 | 0.943 | 0.524 | 0.012 | 0.919 |
| deepseek-reasoner | vanilla | 0.890 | 0.565 | 0.052 | 0.825 | 0.587 | 0.666 | 0.122 | 0.420 | 0.942 | 0.515 | 0.012 | 0.927 |
| google/gemini-3.1-pro-preview | vanilla | 0.887 | 0.549 | 0.013 | 0.838 | 0.646 | 0.697 | 0.010 | 0.449 | 0.941 | 0.514 | 0.003 | 0.928 |
| openai/gpt-5.4-pro | vanilla | 0.894 | 0.595 | 0.046 | 0.798 | 0.632 | 0.733 | 0.136 | 0.399 | 0.943 | 0.519 | 0.007 | 0.924 |
| qwen3.6-plus:free | vanilla | 0.894 | 0.605 | 0.050 | 0.790 | 0.622 | 0.729 | 0.166 | 0.394 | 0.942 | 0.523 | 0.009 | 0.919 |
| anthropic/claude-opus-4.6 | agent | 0.884 | 0.580 | 0.022 | 0.804 | 0.578 | 0.663 | 0.125 | 0.414 | 0.936 | 0.515 | 0.002 | 0.921 |
| deepseek-reasoner | agent | 0.887 | 0.600 | 0.043 | 0.788 | 0.010 | 0.003 | 0.000 | 0.010 | 0.942 | 0.526 | 0.009 | 0.916 |
| google/gemini-3.1-pro-preview | agent | 0.891 | 0.544 | 0.016 | 0.847 | 0.645 | 0.689 | 0.024 | 0.456 | 0.938 | 0.513 | 0.002 | 0.925 |
| openai/gpt-5.4-pro | agent | 0.894 | 0.595 | 0.046 | 0.798 | 0.632 | 0.733 | 0.136 | 0.399 | 0.943 | 0.519 | 0.007 | 0.924 |
| qwen3.6-plus:free | agent | 0.887 | 0.582 | 0.062 | 0.805 | - | - | - | - | - | - | - | - |