Agent Conversation: deepseek-reasoner
Task: security-membership-inference-defense
Membership Inference Defense via Training Regularization
Research Question
How can we design a stronger training-time regularizer that reduces membership inference leakage while preserving standard predictive accuracy?
Background
Membership inference attacks exploit the gap between train and non-train examples, often through confidence or loss statistics. Many defenses regularize predictions to reduce overconfidence and shrink the train-test generalization gap, but stronger privacy often hurts utility.
Task
Implement a better privacy-preserving training objective in bench/membership/custom_membership_defense.py. The fixed harness will train a model on a fixed dataset split, then run a confidence-based membership inference attack on train versus held-out examples.
Your method should improve the privacy-utility tradeoff: lower membership attack AUC while retaining high test accuracy.
Editable Interface
You must implement:
class MembershipDefense:
def compute_loss(self, logits, labels, model, epoch):
...
logits: model outputs for the current minibatchlabels: ground-truth labelsmodel: current model instance, exposed for optional parameter regularizationepoch: current training epoch- Return value: scalar loss tensor used by the fixed training loop
The optimizer, architecture, batch schedule, and attack implementation are fixed.
Evaluation
Benchmarks:
mnistfmnistcifar10
Reported metrics:
test_accmia_aucprivacy_gapprivacy_score
Primary metric: privacy_score (higher is better).
Baselines
erm: standard cross-entropy traininglabel_smoothing: smoothed targets to reduce overconfidenceconfidence_penalty: cross-entropy plus predictive entropy penaltyrelaxloss: margin-aware loss relaxation as a stronger privacy baseline
opacus/bench/membership/custom_membership_defense.py [EDITABLE — entire file only]
1: """Editable membership-inference defense for MLS-Bench."""
2:
3: import torch
4: import torch.nn.functional as F
5:
6: # ============================================================
7: # EDITABLE
8: # ============================================================
9: class MembershipDefense:
10: """Training-time regularizer for privacy-utility tradeoffs."""
11:
12: def __init__(self):
13: self.label_smoothing = 0.0
14: self.entropy_weight = 0.0
15:
16: def compute_loss(self, logits, labels, model, epoch):
17: ce = F.cross_entropy(logits, labels, label_smoothing=self.label_smoothing)
18: probs = torch.softmax(logits, dim=1)
19: entropy = -(probs * torch.log(probs.clamp_min(1e-8))).sum(dim=1).mean()
20: return ce - self.entropy_weight * entropy
21: # ============================================================
22Tool Input
1{
2 "op": "replace",
3 "filename": "opacus/bench/membership/custom_membership_defense.py",
4 "start_line": 1,
5 "end_line": 23,
6 "content": "\"\"\"Editable membership-inference defense for MLS-Bench.\"\"\"\n\nimport torch\nimport torch.nn.functional as F\nimport numpy as np\n\n# ============================================================\n# EDITABLE\n# ============================================================\nclass MembershipDefense:\n \"\"\"Training-time regularizer for privacy-utility tradeoffs.\"\"\"\n\n def __init__(self):\n self.label_smoothing = 0.1\n self.entropy_weight = 0.05\n self.mixup_alpha = 0.2 # Beta distribution parameter for mixup\n \n def compute_loss(self, logits, labels, model, epoch):\n # Mixup regularization\n batch_size = logits.size(0)\n indices = torch.randperm(batch_size, device=logits.device)\n lam = torch.distributions.beta.Beta(self.mixup_alpha, self.mixup_alpha).sample((batch_size,)).to(logits.device)\n # Ensure lam is in [0,1] and shape (batch_size,1) for broadcasting\n lam = lam.view(-1, 1)\n mixed_logits = lam * logits + (1 - lam) * logits[indices]\n \n # Convert labels to one-hot\n num_classes = logits.size(1)\n labels_onehot = F.one_hot(labels, num_classes).float()\n labels_mixed = lam * labels_onehot + (1 - lam) * labels_onehot[indices]\n \n # Cross-entropy with label smoothing on mixed targets\n ce = F.cross_entropy(mixed_logits, labels_mixed, label_smoothing=self.label_smoothing)\n \n # Entropy bonus on original logits (optional)\n probs = torch.softmax(logits, dim=1)\n entropy = -(probs * torch.log(probs.clamp_min(1e-8))).sum(dim=1).mean()\n \n total_loss = ce - self.entropy_weight * entropy\n return total_loss\n# ============================================================\n# END EDITABLE\n# ============================================================"
7}OK: Replaced lines 1..23 with 43 line(s) in opacus/bench/membership/custom_membership_defense.py. Editable range: entire file.
[Current file: opacus/bench/membership/custom_membership_defense.py | editable: 1–43 | total: 43 lines]
1 """Editable membership-inference defense for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5 import numpy as np
6
7 # ============================================================
8 # EDITABLE
9 # ============================================================
10 class MembershipDefense:
11 """Training-time regularizer for privacy-utility tradeoffs."""
12
13 def __init__(self):
14 self.label_smoothing = 0.1
15 self.entropy_weight = 0.05
16 self.mixup_alpha = 0.2 # Beta distribution parameter for mixup
17
18 def compute_loss(self, logits, labels, model, epoch):
19 # Mixup regularizati
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ## Seed 42 ### mnist (scripts/mnist.sh) TRAIN_METRICS epoch=1 loss=2.1891 TRAIN_METRICS epoch=2 loss=2.0105 TRAIN_METRICS epoch=3 loss=1.6003 TRAIN_METRICS epoch=4 loss=1.2672 TRAIN_METRICS epoch=5 loss=1.0963 mnist: test_acc=0.8390, mia_auc=0.5609, privacy_gap=0.0423, privacy_score=0.7781 ### fmnist (scripts/fmnist.sh) TRAIN_METRICS epoch=1 loss=2.1630 TRAIN_METRICS epoch=2 loss=1.6743 TRAIN_METRICS epoch=3 loss=1.3761 TRAIN_METRICS epoch=4 loss=1.2429 TRAIN_METRICS epoch=5 loss=1.1756 fmnist: test_acc=0.7395, mia_auc=0.5212, privacy_gap=0.0157, privacy_score=0.7183 ### cifar10 (scripts/cifar10.sh) TRAIN_METRICS epoch=1 loss=2.1637 TRAIN_METRICS epoch=2 loss=2.0387 TRAIN_METRICS epoch=3 loss=1.9829 TRAIN_METRICS epoch=4 loss=1.9392 TRAIN_METRICS epoch=5 loss=1.9037 TRAIN_METRICS epoch=6 loss=1.9035 cifar10: test_acc=0.3230, mia_auc=0.5017, privacy_gap=0.0012, privacy_score=0.3213 ## Seed 123 ### mnist (scrip
Tool Input
1{
2 "n": 1
3}Undo complete: Restored: opacus/bench/membership/custom_membership_defense.py
Tool Input
1{
2 "op": "replace",
3 "filename": "opacus/bench/membership/custom_membership_defense.py",
4 "start_line": 1,
5 "end_line": 23,
6 "content": "\"\"\"Editable membership-inference defense for MLS-Bench.\"\"\"\n\nimport torch\nimport torch.nn.functional as F\n\n# ============================================================\n# EDITABLE\n# ============================================================\nclass MembershipDefense:\n \"\"\"Training-time regularizer for privacy-utility tradeoffs.\"\"\"\n\n def __init__(self):\n self.label_smoothing = 0.1\n self.focal_gamma = 2.0\n self.entropy_weight = 0.02\n \n def compute_loss(self, logits, labels, model, epoch):\n # Cross-entropy with label smoothing\n ce_per = F.cross_entropy(logits, labels, reduction='none', label_smoothing=self.label_smoothing)\n \n # Focal weight based on confidence\n probs = torch.softmax(logits, dim=1)\n pt = probs.gather(1, labels.unsqueeze(1)).squeeze(1) # probability of true class\n focal_weight = (1 - pt) ** self.focal_gamma\n focal_loss = focal_weight * ce_per\n \n # Entropy bonus\n entropy = -(probs * torch.log(probs.clamp_min(1e-8))).sum(dim=1)\n \n total_per = focal_loss - self.entropy_weight * entropy\n return total_per.mean()\n# ============================================================\n# END EDITABLE\n# ============================================================"
7}OK: Replaced lines 1..23 with 34 line(s) in opacus/bench/membership/custom_membership_defense.py. Editable range: entire file.
[Current file: opacus/bench/membership/custom_membership_defense.py | editable: 1–34 | total: 34 lines]
1 """Editable membership-inference defense for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5
6 # ============================================================
7 # EDITABLE
8 # ============================================================
9 class MembershipDefense:
10 """Training-time regularizer for privacy-utility tradeoffs."""
11
12 def __init__(self):
13 self.label_smoothing = 0.1
14 self.focal_gamma = 2.0
15 self.entropy_weight = 0.02
16
17 def compute_loss(self, logits, labels, model, epoch):
18 # Cross-entropy with label smoothing
19 ce_per = F.cross_entropy(logits, la
[Test #2] (1 test remaining; call submit(n=2) to submit this result)
### mnist (scripts/mnist.sh)
TRAIN_METRICS epoch=1 loss=1.8034
TRAIN_METRICS epoch=2 loss=1.2453
TRAIN_METRICS epoch=3 loss=0.4717
TRAIN_METRICS epoch=4 loss=0.2282
TRAIN_METRICS epoch=5 loss=0.1519
mnist: test_acc=0.9020, mia_auc=0.5439, privacy_gap=0.0233, privacy_score=0.8581
### fmnist (scripts/fmnist.sh)
TRAIN_METRICS epoch=1 loss=1.5950
TRAIN_METRICS epoch=2 loss=0.8192
TRAIN_METRICS epoch=3 loss=0.6163
TRAIN_METRICS epoch=4 loss=0.5087
TRAIN_METRICS epoch=5 loss=0.4597
fmnist: test_acc=0.7050, mia_auc=0.5231, privacy_gap=0.0159, privacy_score=0.6819
### cifar10 (scripts/cifar10.sh)
TRAIN_METRICS epoch=1 loss=1.7494
TRAIN_METRICS epoch=2 loss=1.5514
TRAIN_METRICS epoch=3 loss=1.4054
TRAIN_METRICS epoch=4 loss=1.3046
TRAIN_METRICS epoch=5 loss=1.2844
TRAIN_METRICS epoch=6 loss=1.2050
cifar10: test_acc=0.3670, mia_auc=0.5030, privacy_gap=0.0020, privacy_score=0.3640
[Leaderboard] Results saved: {'test_acc_mnis
Tool Input
1{
2 "op": "replace",
3 "filename": "opacus/bench/membership/custom_membership_defense.py",
4 "start_line": 1,
5 "end_line": 34,
6 "content": "\"\"\"Editable membership-inference defense for MLS-Bench.\"\"\"\n\nimport torch\nimport torch.nn.functional as F\n\n# ============================================================\n# EDITABLE\n# ============================================================\nclass MembershipDefense:\n \"\"\"Training-time regularizer for privacy-utility tradeoffs.\"\"\"\n\n def __init__(self):\n self.label_smoothing = 0.1\n self.entropy_weight = 0.05\n self.confidence_threshold = 0.9\n self.penalty_weight = 0.1\n \n def compute_loss(self, logits, labels, model, epoch):\n # Cross-entropy with label smoothing\n ce = F.cross_entropy(logits, labels, label_smoothing=self.label_smoothing)\n \n probs = torch.softmax(logits, dim=1)\n \n # Entropy bonus\n entropy = -(probs * torch.log(probs.clamp_min(1e-8))).sum(dim=1).mean()\n \n # Confidence penalty: penalize probabilities above threshold\n max_prob = probs.max(dim=1)[0]\n excess = torch.relu(max_prob - self.confidence_threshold)\n penalty = (excess ** 2).mean()\n \n total_loss = ce - self.entropy_weight * entropy + self.penalty_weight * penalty\n return total_loss\n# ============================================================\n# END EDITABLE\n# ============================================================"
7}OK: Replaced lines 1..34 with 36 line(s) in opacus/bench/membership/custom_membership_defense.py. Editable range: entire file.
[Current file: opacus/bench/membership/custom_membership_defense.py | editable: 1–36 | total: 36 lines]
1 """Editable membership-inference defense for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5
6 # ============================================================
7 # EDITABLE
8 # ============================================================
9 class MembershipDefense:
10 """Training-time regularizer for privacy-utility tradeoffs."""
11
12 def __init__(self):
13 self.label_smoothing = 0.1
14 self.entropy_weight = 0.05
15 self.confidence_threshold = 0.9
16 self.penalty_weight = 0.1
17
18 def compute_loss(self, logits, labels, model, epoch):
19 # Cross-entropy with label smoothing
[Test #3] (0 tests remaining) ## Seed 42 ### mnist (scripts/mnist.sh) TRAIN_METRICS epoch=1 loss=2.1775 TRAIN_METRICS epoch=2 loss=1.8509 TRAIN_METRICS epoch=3 loss=1.2572 TRAIN_METRICS epoch=4 loss=0.8743 TRAIN_METRICS epoch=5 loss=0.7663 mnist: test_acc=0.9155, mia_auc=0.5594, privacy_gap=0.0342, privacy_score=0.8561 ### fmnist (scripts/fmnist.sh) TRAIN_METRICS epoch=1 loss=2.0660 TRAIN_METRICS epoch=2 loss=1.5047 TRAIN_METRICS epoch=3 loss=1.2723 TRAIN_METRICS epoch=4 loss=1.1560 TRAIN_METRICS epoch=5 loss=1.0791 fmnist: test_acc=0.7120, mia_auc=0.5210, privacy_gap=0.0162, privacy_score=0.6910 ### cifar10 (scripts/cifar10.sh) TRAIN_METRICS epoch=1 loss=2.1317 TRAIN_METRICS epoch=2 loss=1.9851 TRAIN_METRICS epoch=3 loss=1.8916 TRAIN_METRICS epoch=4 loss=1.7919 TRAIN_METRICS epoch=5 loss=1.7174 TRAIN_METRICS epoch=6 loss=1.6810 cifar10: test_acc=0.4360, mia_auc=0.4990, privacy_gap=-0.0001, privacy_score=0.4360 ## Seed 123 ### mnist (scripts/mnist.sh) TRAIN_METRICS epoch=1 loss
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_membership_defense.py",
4 "start_line": 1,
5 "end_line": 32,
6 "content": "\"\"\"Editable membership-inference defense for MLS-Bench.\"\"\"\n\nimport torch\nimport torch.nn.functional as F\n\n# ============================================================\n# EDITABLE\n# ============================================================\nclass MembershipDefense:\n \"\"\"Margin-aware entropy regularization for membership privacy.\n\n Loss = CrossEntropy - α * entropy(p) + β * max(0, margin - margin_target)\n where margin = p_max - p_second_max.\n Encourages predictions to be both uncertain (high entropy) and have\n small gap between top two probabilities, reducing overconfidence.\n \"\"\"\n\n def __init__(self):\n self.entropy_weight = 0.1 # α\n self.margin_weight = 0.05 # β\n self.margin_target = 0.3 # target margin to penalize above\n\n def compute_loss(self, logits, labels, epoch):\n # Cross-entropy\n ce = F.cross_entropy(logits, labels)\n \n # Predictive entropy bonus\n probs = torch.softmax(logits, dim=1)\n entropy = -(probs * torch.log(probs.clamp_min(1e-8))).sum(dim=1).mean()\n \n # Margin penalty\n sorted_probs, _ = torch.sort(probs, dim=1, descending=True)\n p_max = sorted_probs[:, 0]\n p_second = sorted_probs[:, 1]\n margin = p_max - p_second\n margin_penalty = torch.clamp_min(margin - self.margin_target, 0).mean()\n \n total_loss = ce - self.entropy_weight * entropy + self.margin_weight * margin_penalty\n return total_loss\n# ============================================================\n# END EDITABLE\n# ============================================================"
7}OK: Replaced lines 1..32 with 42 line(s) in pytorch-vision/custom_membership_defense.py. Editable range: entire file.
[Current file: pytorch-vision/custom_membership_defense.py | editable: 1–42 | total: 42 lines]
1 """Editable membership-inference defense for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5
6 # ============================================================
7 # EDITABLE
8 # ============================================================
9 class MembershipDefense:
10 """Margin-aware entropy regularization for membership privacy.
11
12 Loss = CrossEntropy - α * entropy(p) + β * max(0, margin - margin_target)
13 where margin = p_max - p_second_max.
14 Encourages predictions to be both uncertain (high entropy) and have
15 small gap between top two probabilities, reducing overconfidence.
16 """
17
18 def __init__(self):
19
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ## Seed 42 ### resnet20-cifar10 (scripts/resnet20_cifar10.sh) TRAIN_METRICS epoch=1 loss=1.6224 train_acc=32.90 lr=0.099975 TRAIN_METRICS epoch=10 loss=0.6084 train_acc=77.21 lr=0.097553 TRAIN_METRICS epoch=20 loss=0.4951 train_acc=81.59 lr=0.090451 TRAIN_METRICS epoch=30 loss=0.4208 train_acc=84.60 lr=0.079389 TRAIN_METRICS epoch=40 loss=0.3632 train_acc=86.81 lr=0.065451 TRAIN_METRICS epoch=50 loss=0.3173 train_acc=88.58 lr=0.050000 TRAIN_METRICS epoch=60 loss=0.2553 train_acc=91.14 lr=0.034549 TRAIN_METRICS epoch=70 loss=0.1845 train_acc=93.93 lr=0.020611 TRAIN_METRICS epoch=80 loss=0.1020 train_acc=97.54 lr=0.009549 TRAIN_METRICS epoch=90 loss=0.0624 train_acc=99.16 lr=0.002447 TRAIN_METRICS epoch=100 loss=0.0551 train_acc=99.53 lr=0.000000 resnet20-cifar10: test_acc=0.8816, mia_auc=0.5673, privacy_gap=0.0526, privacy_score=0.8143 ### vgg16bn-cifar100 (scripts/vgg16bn_cifar100.sh) TRAIN_METRICS epoch=1 loss=4
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_membership_defense.py",
4 "start_line": 1,
5 "end_line": 42,
6 "content": "\"\"\"Editable membership-inference defense for MLS-Bench.\"\"\"\n\nimport torch\nimport torch.nn.functional as F\n\n# ============================================================\n# EDITABLE\n# ============================================================\nclass MembershipDefense:\n \"\"\"Privacy-Calibrated Cross-Entropy with adaptive temperature.\n\n Loss = CE(logits / T(epoch)) - α * entropy(p) + β * max(0, margin - margin_target)\n where T(epoch) linearly increases from T_start to T_end over training.\n Higher temperature smooths probabilities, reducing overconfidence.\n \"\"\"\n\n def __init__(self):\n # Temperature annealing\n self.temp_start = 1.0\n self.temp_end = 2.0\n # Regularization weights\n self.entropy_weight = 0.1 # α\n self.margin_weight = 0.05 # β\n self.margin_target = 0.3 # target margin to penalize above\n # Assume 100 epochs total (as in training loop)\n self.total_epochs = 100\n\n def compute_loss(self, logits, labels, epoch):\n # Temperature annealing linear over epochs\n temperature = self.temp_start + (self.temp_end - self.temp_start) * (epoch / (self.total_epochs - 1))\n scaled_logits = logits / temperature\n \n # Temperature-scaled cross-entropy\n ce = F.cross_entropy(scaled_logits, labels)\n \n # Predictive entropy bonus (on original logits or scaled?)\n # Using scaled logits for consistency\n probs = torch.softmax(scaled_logits, dim=1)\n entropy = -(probs * torch.log(probs.clamp_min(1e-8))).sum(dim=1).mean()\n \n # Margin penalty on original probabilities (or scaled?)\n # Use scaled probabilities as they are used for loss\n sorted_probs, _ = torch.sort(probs, dim=1, descending=True)\n p_max = sorted_probs[:, 0]\n p_second = sorted_probs[:, 1]\n margin = p_max - p_second\n margin_penalty = torch.clamp_min(margin - self.margin_target, 0).mean()\n \n total_loss = ce - self.entropy_weight * entropy + self.margin_weight * margin_penalty\n return total_loss\n# ============================================================\n# END EDITABLE\n# ============================================================"
7}