security-membership-inference-defense

Adversarial MLpytorch-visionrigorous codebase

Description

Membership Inference Defense via Training Regularization

Research Question

How can we design a stronger training-time regularizer that reduces membership inference leakage while preserving standard predictive accuracy?

Background

Membership inference attacks exploit the gap between train and non-train examples, often through confidence or loss statistics. Many defenses regularize predictions to reduce overconfidence and shrink the train-test generalization gap, but stronger privacy often hurts utility.

Task

Implement a better privacy-preserving training objective in bench/membership/custom_membership_defense.py. The fixed harness will train a model on a 50/50 train/non-train split of the full dataset, then run a confidence-based membership inference attack on train versus held-out examples.

Your method should improve the privacy-utility tradeoff: lower membership attack AUC while retaining high test accuracy.

Editable Interface

You must implement:

class MembershipDefense:
    def compute_loss(self, logits, labels, epoch):
        ...
  • logits: model outputs for the current minibatch
  • labels: ground-truth labels
  • epoch: current training epoch (0-indexed)
  • Return value: scalar loss tensor used by the fixed training loop

The optimizer (SGD + CosineAnnealing), architecture, data pipeline, and attack implementation are fixed.

Evaluation

Benchmarks:

  • resnet20-cifar10: ResNet-20 on CIFAR-10
  • vgg16bn-cifar100: VGG-16-BN on CIFAR-100
  • mobilenetv2-fmnist: MobileNetV2 on FashionMNIST

Reported metrics:

  • test_acc
  • mia_auc
  • privacy_gap
  • privacy_score

Primary metric: privacy_score (higher is better), defined as test_acc - max(mia_auc - 0.5, 0).

Baselines

  • erm: standard cross-entropy training
  • label_smoothing: smoothed targets to reduce overconfidence
  • confidence_penalty: cross-entropy plus predictive entropy penalty
  • relaxloss: margin-aware loss relaxation as a stronger privacy baseline

Code

custom_membership_defense.py
EditableRead-only
1"""Editable membership-inference defense for MLS-Bench."""
2
3import torch
4import torch.nn.functional as F
5
6# ============================================================
7# EDITABLE
8# ============================================================
9class MembershipDefense:
10 """Training-time regularizer for privacy-utility tradeoffs.
11
12 The compute_loss method replaces nn.CrossEntropyLoss() in the
13 fixed training loop. Design a loss that reduces membership
14 inference leakage (lower MIA AUC) while preserving test accuracy.
15
run_membership_defense.py
EditableRead-only
1"""Fixed evaluation harness for security-membership-inference-defense.
2
3Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST
4to evaluate custom membership-inference defense losses.
5
6FIXED: Model architectures, data pipeline, training loop, MIA evaluation.
7EDITABLE: MembershipDefense.compute_loss() method.
8
9Usage:
10 python run_membership_defense.py --arch resnet20 --dataset cifar10 --seed 42
11 python run_membership_defense.py --arch mobilenetv2 --dataset fmnist --seed 42
12"""
13
14import argparse
15import os

Results

ModelTypetest acc resnet20 cifar10 mia auc resnet20 cifar10 privacy gap resnet20 cifar10 privacy score resnet20 cifar10 test acc vgg16bn cifar100 mia auc vgg16bn cifar100 privacy gap vgg16bn cifar100 privacy score vgg16bn cifar100 test acc mobilenetv2 fmnist mia auc mobilenetv2 fmnist privacy gap mobilenetv2 fmnist privacy score mobilenetv2 fmnist
confidence_penaltybaseline0.8840.5700.0510.8140.5700.6400.1270.4300.9430.5170.0120.927
ermbaseline0.8910.5690.0450.8220.6290.7150.1310.4140.9430.5160.0110.927
label_smoothingbaseline0.8920.6030.0570.7890.4580.5500.0360.4080.9400.5200.0100.920
relaxlossbaseline0.8880.6000.0340.7880.4020.5490.0110.3530.9440.5270.0060.918
relaxlossbaseline0.1040.7860.000-0.1820.0100.0010.0000.0100.1000.0010.0000.100
anthropic/claude-opus-4.6vanilla0.8940.6050.0590.7900.0100.0030.0000.0100.9430.5240.0120.919
deepseek-reasonervanilla0.8900.5650.0520.8250.5870.6660.1220.4200.9420.5150.0120.927
google/gemini-3.1-pro-previewvanilla0.8870.5490.0130.8380.6460.6970.0100.4490.9410.5140.0030.928
openai/gpt-5.4-provanilla0.8940.5950.0460.7980.6320.7330.1360.3990.9430.5190.0070.924
qwen3.6-plus:freevanilla0.8940.6050.0500.7900.6220.7290.1660.3940.9420.5230.0090.919
anthropic/claude-opus-4.6agent0.8840.5800.0220.8040.5780.6630.1250.4140.9360.5150.0020.921
deepseek-reasoneragent0.8870.6000.0430.7880.0100.0030.0000.0100.9420.5260.0090.916
google/gemini-3.1-pro-previewagent0.8910.5440.0160.8470.6450.6890.0240.4560.9380.5130.0020.925
openai/gpt-5.4-proagent0.8940.5950.0460.7980.6320.7330.1360.3990.9430.5190.0070.924
qwen3.6-plus:freeagent0.8870.5820.0620.805--------

Agent Conversations