Agent Conversation: deepseek-reasoner
Task: security-machine-unlearning
Machine Unlearning via Targeted Update Rules
Research Question
How can we design a stronger unlearning update rule that removes information about a forget set while retaining as much utility as possible on the retained data?
Background
Machine unlearning methods approximate the effect of retraining without the deleted data. The central tradeoff is clear: aggressive forgetting reduces utility, while conservative updates leave measurable traces of the forgotten examples.
Task
Implement a better unlearning rule in bench/unlearning/custom_unlearning.py. The fixed harness trains an initial model, defines a forget split, and then applies your update rule for a fixed number of unlearning steps using retain and forget minibatches.
Your method should lower forget-set memorization while preserving retained-task accuracy.
Editable Interface
You must implement:
class UnlearningMethod:
def unlearn_step(self, model, retain_batch, forget_batch, optimizer, step, epoch):
...
retain_batch: minibatch sampled from retained dataforget_batch: minibatch sampled from the forget setoptimizer: fixed optimizer instance- Return value: dict with at least
loss
The architecture, initial training, forget split, and evaluation probes are fixed.
Evaluation
Benchmarks:
mnist-class0fmnist-class0cifar10-class0
Reported metrics:
retain_accforget_accforget_mia_aucunlearn_score
Primary metric: unlearn_score (higher is better).
Baselines
retain_finetune: continue training only on retained datanegative_gradient: ascend forget loss and descend retain lossbad_teacher: distillation-style forgetting baselinescrub: stronger representation-scrubbing baseline
opacus/bench/unlearning/custom_unlearning.py [EDITABLE — entire file only]
1: """Editable unlearning method for MLS-Bench."""
2:
3: import torch
4: import torch.nn.functional as F
5:
6: # ============================================================
7: # EDITABLE
8: # ============================================================
9: class UnlearningMethod:
10: """Default retain-only finetuning update."""
11:
12: def __init__(self):
13: self.forget_weight = 0.0
14:
15: def unlearn_step(self, model, retain_batch, forget_batch, optimizer, step, epoch):
16: retain_x, retain_y = retain_batch
17: logits = model(retain_x)
18: loss = F.cross_entropy(logits, retain_y)
19: optimizer.zero_grad()
20: loss.backward()
21: optimizer.step()
22: return {"loss": loss.item()}
23: # ============================================================
24: # END EDITABLE
25: # ============================================================
retain_finetune baseline — editable region [READ-ONLY — ref
Tool Input
1{
2 "op": "replace",
3 "filename": "opacus/bench/unlearning/custom_unlearning.py",
4 "start_line": 1,
5 "end_line": 25,
6 "content": "\"\"\"Editable unlearning method for MLS-Bench.\"\"\"\n\nimport torch\nimport torch.nn.functional as F\n\n# ============================================================\n# EDITABLE\n# ============================================================\nclass UnlearningMethod:\n \"\"\"Unlearning via KL divergence to uniform with weight regularization.\"\"\"\n\n def __init__(self):\n self.kl_weight = 0.5 # weight for KL loss on forget data\n self.reg_weight = 0.01 # weight for L2 regularization on weight changes\n # store initial model parameters after first call\n self.init_params = None\n\n def unlearn_step(self, model, retain_batch, forget_batch, optimizer, step, epoch):\n retain_x, retain_y = retain_batch\n forget_x, forget_y = forget_batch\n \n # Store initial parameters if not done\n if self.init_params is None:\n self.init_params = [p.detach().clone() for p in model.parameters()]\n \n # Retain loss\n retain_logits = model(retain_x)\n retain_loss = F.cross_entropy(retain_logits, retain_y)\n \n # KL loss on forget data: push predictions towards uniform distribution\n forget_logits = model(forget_x)\n num_classes = forget_logits.size(1)\n uniform_target = torch.ones_like(forget_logits) / num_classes\n forget_kl = F.kl_div(\n F.log_softmax(forget_logits, dim=1),\n uniform_target,\n reduction='batchmean'\n )\n \n # L2 regularization on weight changes from initial model\n reg_loss = 0.0\n for p, init_p in zip(model.parameters(), self.init_params):\n reg_loss += torch.sum((p - init_p) ** 2)\n reg_loss = reg_loss / sum(p.numel() for p in model.parameters()) # average per parameter\n \n # Total loss\n loss = retain_loss + self.kl_weight * forget_kl + self.reg_weight * reg_loss\n \n optimizer.zero_grad()\n loss.backward()\n optimizer.step()\n \n return {\n \"loss\": loss.item(),\n \"retain_loss\": retain_loss.item(),\n \"forget_kl\": forget_kl.item(),\n \"reg_loss\": reg_loss.item()\n }\n# ============================================================\n# END EDITABLE\n# ============================================================"
7}OK: Replaced lines 1..25 with 61 line(s) in opacus/bench/unlearning/custom_unlearning.py. Editable range: entire file.
[Current file: opacus/bench/unlearning/custom_unlearning.py | editable: 1–61 | total: 61 lines]
1 """Editable unlearning method for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5
6 # ============================================================
7 # EDITABLE
8 # ============================================================
9 class UnlearningMethod:
10 """Unlearning via KL divergence to uniform with weight regularization."""
11
12 def __init__(self):
13 self.kl_weight = 0.5 # weight for KL loss on forget data
14 self.reg_weight = 0.01 # weight for L2 regularization on weight changes
15 # store initial model parameters after first call
16 self.init_params = None
17
18 def unlearn_step(self, model, retain_ba
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ## Seed 42 ### mnist-class0 (scripts/mnist_class0.sh) TRAIN_METRICS phase=pretrain epoch=1 loss=2.2914 TRAIN_METRICS phase=pretrain epoch=2 loss=1.8431 TRAIN_METRICS phase=pretrain epoch=3 loss=1.2475 TRAIN_METRICS phase=pretrain epoch=4 loss=0.6942 TRAIN_METRICS phase=unlearn epoch=1 loss=2.0677 TRAIN_METRICS phase=unlearn epoch=2 loss=1.3075 mnist-class0: retain_acc=0.7616, forget_acc=0.0000, forget_mia_auc=0.4335, unlearn_score=0.7760 ### fmnist-class0 (scripts/fmnist_class0.sh) TRAIN_METRICS phase=pretrain epoch=1 loss=2.1444 TRAIN_METRICS phase=pretrain epoch=2 loss=1.2730 TRAIN_METRICS phase=pretrain epoch=3 loss=0.9583 TRAIN_METRICS phase=pretrain epoch=4 loss=0.8279 TRAIN_METRICS phase=unlearn epoch=1 loss=2.1418 TRAIN_METRICS phase=unlearn epoch=2 loss=1.3581 fmnist-class0: retain_acc=0.6339, forget_acc=0.0000, forget_mia_auc=0.4773, unlearn_score=0.7188 ### cifar10-class0 (scripts/cifar10_class0.sh) TR
Tool Input
1{
2 "op": "replace",
3 "filename": "opacus/bench/unlearning/custom_unlearning.py",
4 "start_line": 1,
5 "end_line": 61,
6 "content": "\"\"\"Editable unlearning method for MLS-Bench.\"\"\"\n\nimport torch\nimport torch.nn.functional as F\n\n# ============================================================\n# EDITABLE\n# ============================================================\nclass UnlearningMethod:\n \"\"\"Unlearning via gradient projection with KL regularization.\"\"\"\n\n def __init__(self):\n self.kl_weight = 0.1 # weight for KL loss on forget data\n self.projection_weight = 1.0 # weight for gradient projection (orthogonalization)\n self.use_projection = True\n\n def unlearn_step(self, model, retain_batch, forget_batch, optimizer, step, epoch):\n retain_x, retain_y = retain_batch\n forget_x, forget_y = forget_batch\n \n # Compute retain loss\n retain_logits = model(retain_x)\n retain_loss = F.cross_entropy(retain_logits, retain_y)\n \n # Compute forget loss (KL to uniform)\n forget_logits = model(forget_x)\n num_classes = forget_logits.size(1)\n uniform_target = torch.ones_like(forget_logits) / num_classes\n forget_kl = F.kl_div(\n F.log_softmax(forget_logits, dim=1),\n uniform_target,\n reduction='batchmean'\n )\n \n # Total loss before projection\n loss = retain_loss + self.kl_weight * forget_kl\n \n if self.use_projection:\n # Compute gradients for retain and forget losses separately\n # First, compute retain gradients\n optimizer.zero_grad()\n retain_loss.backward(retain_graph=True)\n retain_grads = []\n for p in model.parameters():\n if p.grad is not None:\n retain_grads.append(p.grad.detach().clone())\n else:\n retain_grads.append(torch.zeros_like(p))\n optimizer.zero_grad()\n \n # Compute forget gradients (using forget_kl)\n forget_kl.backward(retain_graph=True)\n forget_grads = []\n for p in model.parameters():\n if p.grad is not None:\n forget_grads.append(p.grad.detach().clone())\n else:\n forget_grads.append(torch.zeros_like(p))\n optimizer.zero_grad()\n \n # Project retain gradients onto orthogonal complement of forget gradients\n projected_grads = []\n for g_ret, g_for in zip(retain_grads, forget_grads):\n # Compute dot product and norm\n dot = torch.sum(g_ret * g_for)\n norm_for = torch.sum(g_for * g_for)\n # Avoid division by zero\n if norm_for > 1e-10:\n scale = dot / norm_for\n g_proj = g_ret - scale * g_for\n else:\n g_proj = g_ret\n projected_grads.append(g_proj)\n \n # Assign projected gradients to model parameters\n for p, g in zip(model.parameters(), projected_grads):\n p.grad = g.clone()\n \n # Step optimizer\n optimizer.step()\n else:\n # Standard gradient descent\n optimizer.zero_grad()\n loss.backward()\n optimizer.step()\n \n return {\n \"loss\": loss.item(),\n \"retain_loss\": retain_loss.item(),\n \"forget_kl\": forget_kl.item()\n }\n# ============================================================\n# END EDITABLE\n# ============================================================"
7}OK: Replaced lines 1..61 with 94 line(s) in opacus/bench/unlearning/custom_unlearning.py. Editable range: entire file.
[Current file: opacus/bench/unlearning/custom_unlearning.py | editable: 1–94 | total: 94 lines]
1 """Editable unlearning method for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5
6 # ============================================================
7 # EDITABLE
8 # ============================================================
9 class UnlearningMethod:
10 """Unlearning via gradient projection with KL regularization."""
11
12 def __init__(self):
13 self.kl_weight = 0.1 # weight for KL loss on forget data
14 self.projection_weight = 1.0 # weight for gradient projection (orthogonalization)
15 self.use_projection = True
16
17 def unlearn_step(self, model, retain_batch, forget_batch, optimizer, step, epoch):
18 re
[Test #2] (1 test remaining; call submit(n=2) to submit this result) ### mnist-class0 (scripts/mnist_class0.sh) TRAIN_METRICS phase=pretrain epoch=1 loss=2.2933 TRAIN_METRICS phase=pretrain epoch=2 loss=1.7685 TRAIN_METRICS phase=pretrain epoch=3 loss=0.9919 TRAIN_METRICS phase=pretrain epoch=4 loss=0.5718 TRAIN_METRICS phase=unlearn epoch=1 loss=1.2878 TRAIN_METRICS phase=unlearn epoch=2 loss=1.8940 mnist-class0: retain_acc=0.7512, forget_acc=0.9714, forget_mia_auc=0.5751, unlearn_score=0.4016 ### fmnist-class0 (scripts/fmnist_class0.sh) TRAIN_METRICS phase=pretrain epoch=1 loss=2.1445 TRAIN_METRICS phase=pretrain epoch=2 loss=1.3109 TRAIN_METRICS phase=pretrain epoch=3 loss=0.9688 TRAIN_METRICS phase=pretrain epoch=4 loss=0.8449 TRAIN_METRICS phase=unlearn epoch=1 loss=1.7853 TRAIN_METRICS phase=unlearn epoch=2 loss=2.5286 fmnist-class0: retain_acc=0.6611, forget_acc=0.0000, forget_mia_auc=0.5117, unlearn_score=0.7165 ### cifar10-class0 (scripts/cifar10_class0.sh) TRAIN_METRICS ph
Tool Input
1{
2 "op": "replace",
3 "filename": "opacus/bench/unlearning/custom_unlearning.py",
4 "start_line": 1,
5 "end_line": 94,
6 "content": "\"\"\"Editable unlearning method for MLS-Bench.\"\"\"\n\nimport torch\nimport torch.nn.functional as F\n\n# ============================================================\n# EDITABLE\n# ============================================================\nclass UnlearningMethod:\n \"\"\"Unlearning via gradient surgery with orthogonal retain and negative forget.\"\"\"\n\n def __init__(self):\n self.forget_weight = 0.5 # weight for negative forget gradient\n self.orthogonal_weight = 1.0 # weight for orthogonal retain component\n self.use_kl = False # whether to use KL to uniform instead of CE for forget\n self.kl_weight = 0.1\n\n def unlearn_step(self, model, retain_batch, forget_batch, optimizer, step, epoch):\n retain_x, retain_y = retain_batch\n forget_x, forget_y = forget_batch\n \n # Compute retain loss\n retain_logits = model(retain_x)\n retain_loss = F.cross_entropy(retain_logits, retain_y)\n \n # Compute forget loss\n forget_logits = model(forget_x)\n if self.use_kl:\n num_classes = forget_logits.size(1)\n uniform_target = torch.ones_like(forget_logits) / num_classes\n forget_loss = F.kl_div(\n F.log_softmax(forget_logits, dim=1),\n uniform_target,\n reduction='batchmean'\n )\n else:\n forget_loss = F.cross_entropy(forget_logits, forget_y)\n \n # Compute gradients separately\n optimizer.zero_grad()\n retain_loss.backward(retain_graph=True)\n retain_grads = []\n for p in model.parameters():\n if p.grad is not None:\n retain_grads.append(p.grad.detach().clone())\n else:\n retain_grads.append(torch.zeros_like(p))\n optimizer.zero_grad()\n \n forget_loss.backward(retain_graph=True)\n forget_grads = []\n for p in model.parameters():\n if p.grad is not None:\n forget_grads.append(p.grad.detach().clone())\n else:\n forget_grads.append(torch.zeros_like(p))\n optimizer.zero_grad()\n \n # Compute orthogonal component of retain gradient wrt forget gradient\n orthogonal_grads = []\n for g_ret, g_for in zip(retain_grads, forget_grads):\n dot = torch.sum(g_ret * g_for)\n norm_for = torch.sum(g_for * g_for)\n if norm_for > 1e-10:\n scale = dot / norm_for\n g_orth = g_ret - scale * g_for\n else:\n g_orth = g_ret\n orthogonal_grads.append(g_orth)\n \n # Combined gradient: orthogonal retain component minus weighted forget gradient\n combined_grads = []\n for g_orth, g_for in zip(orthogonal_grads, forget_grads):\n g_combined = self.orthogonal_weight * g_orth - self.forget_weight * g_for\n combined_grads.append(g_combined)\n \n # Assign gradients to model parameters\n for p, g in zip(model.parameters(), combined_grads):\n p.grad = g.clone()\n \n # Step optimizer\n optimizer.step()\n \n return {\n \"loss\": retain_loss.item() + self.forget_weight * forget_loss.item(),\n \"retain_loss\": retain_loss.item(),\n \"forget_loss\": forget_loss.item()\n }\n# ============================================================\n# END EDITABLE\n# ============================================================"
7}OK: Replaced lines 1..94 with 91 line(s) in opacus/bench/unlearning/custom_unlearning.py. Editable range: entire file.
[Current file: opacus/bench/unlearning/custom_unlearning.py | editable: 1–91 | total: 91 lines]
1 """Editable unlearning method for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5
6 # ============================================================
7 # EDITABLE
8 # ============================================================
9 class UnlearningMethod:
10 """Unlearning via gradient surgery with orthogonal retain and negative forget."""
11
12 def __init__(self):
13 self.forget_weight = 0.5 # weight for negative forget gradient
14 self.orthogonal_weight = 1.0 # weight for orthogonal retain component
15 self.use_kl = False # whether to use KL to uniform instead of CE for forget
16 self.kl_weight = 0.1
17
18 def
[Test #3] (0 tests remaining) ## Seed 42 ### mnist-class0 (scripts/mnist_class0.sh) TRAIN_METRICS phase=pretrain epoch=1 loss=2.2914 TRAIN_METRICS phase=pretrain epoch=2 loss=1.8422 TRAIN_METRICS phase=pretrain epoch=3 loss=1.2468 TRAIN_METRICS phase=pretrain epoch=4 loss=0.6892 TRAIN_METRICS phase=unlearn epoch=1 loss=10.8208 TRAIN_METRICS phase=unlearn epoch=2 loss=67.7496 mnist-class0: retain_acc=0.0981, forget_acc=0.0000, forget_mia_auc=0.2389, unlearn_score=0.6197 ### fmnist-class0 (scripts/fmnist_class0.sh) TRAIN_METRICS phase=pretrain epoch=1 loss=2.1445 TRAIN_METRICS phase=pretrain epoch=2 loss=1.3084 TRAIN_METRICS phase=pretrain epoch=3 loss=0.9705 TRAIN_METRICS phase=pretrain epoch=4 loss=0.8428 TRAIN_METRICS phase=unlearn epoch=1 loss=7.7461 TRAIN_METRICS phase=unlearn epoch=2 loss=63.5909 fmnist-class0: retain_acc=0.1883, forget_acc=0.0000, forget_mia_auc=0.5195, unlearn_score=0.5563 ### cifar10-class0 (scripts/cifar10_class0.sh) TRAIN_METRICS phase=pretrain epoch=1 lo
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/bench/unlearning/custom_unlearning.py",
4 "start_line": 1,
5 "end_line": 25,
6 "content": "\"\"\"Editable unlearning method for MLS-Bench.\"\"\"\n\nimport torch\nimport torch.nn.functional as F\n\n# ============================================================\n# EDITABLE\n# ============================================================\nclass UnlearningMethod:\n \"\"\"Gradient Projection with Orthogonal Forgetting.\n \n Computes gradients from retain and forget losses, projects forget gradient\n orthogonal to retain gradient, then combines them to update model.\n This aims to forget while minimally interfering with retain performance.\n \"\"\"\n\n def __init__(self):\n self.forget_weight = 0.5 # λ: weight for orthogonal forget gradient\n self.eps = 1e-8\n\n def unlearn_step(self, model, retain_batch, forget_batch, optimizer, step, epoch):\n retain_x, retain_y = retain_batch\n forget_x, forget_y = forget_batch\n \n # Compute retain loss and gradient\n model.zero_grad()\n retain_logits = model(retain_x)\n retain_loss = F.cross_entropy(retain_logits, retain_y)\n retain_loss.backward(retain_graph=True) # keep graph for second backward\n \n # Collect retain gradient\n retain_grad = []\n for param in model.parameters():\n if param.grad is not None:\n retain_grad.append(param.grad.clone())\n else:\n retain_grad.append(torch.zeros_like(param))\n \n # Compute forget loss and gradient\n model.zero_grad()\n forget_logits = model(forget_x)\n forget_loss = F.cross_entropy(forget_logits, forget_y)\n forget_loss.backward()\n \n # Collect forget gradient\n forget_grad = []\n for param in model.parameters():\n if param.grad is not None:\n forget_grad.append(param.grad.clone())\n else:\n forget_grad.append(torch.zeros_like(param))\n \n # Compute projection coefficient: (g_forget · g_retain) / (||g_retain||^2 + eps)\n numerator = 0.0\n denominator = 0.0\n for rg, fg in zip(retain_grad, forget_grad):\n numerator += torch.sum(rg * fg)\n denominator += torch.sum(rg * rg)\n proj_coef = numerator / (denominator + self.eps)\n \n # Compute orthogonal forget gradient: g_forget_orth = g_forget - proj_coef * g_retain\n # Then combine: total_gradient = g_retain - λ * g_forget_orth\n model.zero_grad()\n for param, rg, fg in zip(model.parameters(), retain_grad, forget_grad):\n if param.grad is None:\n param.grad = torch.zeros_like(param)\n fg_orth = fg - proj_coef * rg\n total_grad = rg - self.forget_weight * fg_orth\n param.grad.copy_(total_grad)\n \n # Optimizer step\n optimizer.step()\n \n return {\n \"loss\": retain_loss.item() - self.forget_weight * forget_loss.item(),\n \"retain_loss\": retain_loss.item(),\n \"forget_loss\": forget_loss.item(),\n \"proj_coef\": proj_coef.item()\n }\n# ============================================================\n# END EDITABLE\n# ============================================================"
7}OK: Replaced lines 1..25 with 82 line(s) in pytorch-vision/bench/unlearning/custom_unlearning.py. Editable range: entire file.
[Current file: pytorch-vision/bench/unlearning/custom_unlearning.py | editable: 1–82 | total: 82 lines]
1 """Editable unlearning method for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5
6 # ============================================================
7 # EDITABLE
8 # ============================================================
9 class UnlearningMethod:
10 """Gradient Projection with Orthogonal Forgetting.
11
12 Computes gradients from retain and forget losses, projects forget gradient
13 orthogonal to retain gradient, then combines them to update model.
14 This aims to forget while minimally interfering with retain performance.
15 """
16
17 def __init__(self):
18 self.forget_weight = 0.5 # λ: weight fo
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ## Seed 42 ### resnet20-cifar10-class0 (scripts/resnet20_cifar10_class0.sh) TRAIN_METRICS phase=pretrain epoch=1 loss=1.5574 acc=42.08 TRAIN_METRICS phase=pretrain epoch=10 loss=0.5655 acc=80.50 TRAIN_METRICS phase=pretrain epoch=20 loss=0.4727 acc=83.88 TRAIN_METRICS phase=pretrain epoch=30 loss=0.4142 acc=85.77 TRAIN_METRICS phase=pretrain epoch=40 loss=0.3473 acc=88.18 TRAIN_METRICS phase=pretrain epoch=50 loss=0.2721 acc=90.71 TRAIN_METRICS phase=pretrain epoch=60 loss=0.1868 acc=93.53 TRAIN_METRICS phase=pretrain epoch=70 loss=0.0872 acc=97.12 TRAIN_METRICS phase=pretrain epoch=80 loss=0.0496 acc=98.61 TRAIN_METRICS phase=unlearn epoch=1 loss=-6.1474 TRAIN_METRICS phase=unlearn epoch=5 loss=-26.7190 TRAIN_METRICS phase=unlearn epoch=10 loss=-90.4118 TRAIN_METRICS phase=unlearn epoch=15 loss=-211.1105 TRAIN_METRICS phase=unlearn epoch=20 loss=-384.1965 resnet20-cifar10-class0: retain_acc=0.1027, forget_acc=0.0
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/bench/unlearning/custom_unlearning.py",
4 "start_line": 1,
5 "end_line": 82,
6 "content": "\"\"\"Editable unlearning method for MLS-Bench.\"\"\"\n\nimport torch\nimport torch.nn.functional as F\n\n# ============================================================\n# EDITABLE\n# ============================================================\nclass UnlearningMethod:\n \"\"\"Gradient Projection with Orthogonal Forgetting.\n \n Computes gradients from retain and forget losses using autograd.grad,\n projects forget gradient orthogonal to retain gradient, then combines.\n Includes gradient clipping and forget weight scheduling.\n \"\"\"\n\n def __init__(self):\n self.forget_weight = 0.5 # λ: initial weight for orthogonal forget gradient\n self.eps = 1e-8\n self.grad_clip = 1.0 # clip gradient norm\n\n def unlearn_step(self, model, retain_batch, forget_batch, optimizer, step, epoch):\n retain_x, retain_y = retain_batch\n forget_x, forget_y = forget_batch\n \n # Compute retain loss\n retain_logits = model(retain_x)\n retain_loss = F.cross_entropy(retain_logits, retain_y)\n \n # Compute forget loss\n forget_logits = model(forget_x)\n forget_loss = F.cross_entropy(forget_logits, forget_y)\n \n # Compute gradients using autograd.grad (does not accumulate in .grad)\n params = list(model.parameters())\n \n # Gradient of retain loss w.r.t parameters\n g_retain = torch.autograd.grad(retain_loss, params, retain_graph=True)\n \n # Gradient of forget loss w.r.t parameters\n g_forget = torch.autograd.grad(forget_loss, params)\n \n # Flatten gradients for dot product\n flat_retain = torch.cat([g.reshape(-1) for g in g_retain if g is not None])\n flat_forget = torch.cat([g.reshape(-1) for g in g_forget if g is not None])\n \n # Compute projection coefficient: (g_forget · g_retain) / (||g_retain||^2 + eps)\n numerator = torch.dot(flat_forget, flat_retain)\n denominator = torch.dot(flat_retain, flat_retain) + self.eps\n proj_coef = numerator / denominator\n \n # Clip projection coefficient to reasonable range\n proj_coef = torch.clamp(proj_coef, -10.0, 10.0)\n \n # Compute orthogonal forget gradient: g_forget_orth = g_forget - proj_coef * g_retain\n # Then total gradient: total_grad = g_retain - λ * g_forget_orth\n # Schedule forget weight λ: cosine annealing from 0.5 to 0.1 over epochs\n total_epochs = 20\n if total_epochs > 1:\n curr_lambda = self.forget_weight * (0.5 + 0.5 * torch.cos(torch.tensor(epoch * 3.14159 / total_epochs)))\n else:\n curr_lambda = self.forget_weight\n \n # Combine gradients\n total_grad = []\n for rg, fg in zip(g_retain, g_forget):\n if rg is None or fg is None:\n total_grad.append(None)\n else:\n fg_orth = fg - proj_coef * rg\n total = rg - curr_lambda * fg_orth\n total_grad.append(total)\n \n # Zero existing gradients\n model.zero_grad()\n \n # Assign total gradient to model parameters\n for param, grad in zip(params, total_grad):\n if grad is not None:\n param.grad = grad\n \n # Clip gradient norm\n torch.nn.utils.clip_grad_norm_(model.parameters(), self.grad_clip)\n \n # Optimizer step\n optimizer.step()\n \n return {\n \"loss\": retain_loss.item() - curr_lambda * forget_loss.item(),\n \"retain_loss\": retain_loss.item(),\n \"forget_loss\": forget_loss.item(),\n \"proj_coef\": proj_coef.item(),\n \"lambda\": curr_lambda.item() if isinstance(curr_lambda, torch.Tensor) else curr_lambda\n }\n# ============================================================\n# END EDITABLE\n# ============================================================"
7}OK: Replaced lines 1..82 with 97 line(s) in pytorch-vision/bench/unlearning/custom_unlearning.py. Editable range: entire file.
[Current file: pytorch-vision/bench/unlearning/custom_unlearning.py | editable: 1–97 | total: 97 lines]
1 """Editable unlearning method for MLS-Bench."""
2
3 import torch
4 import torch.nn.functional as F
5
6 # ============================================================
7 # EDITABLE
8 # ============================================================
9 class UnlearningMethod:
10 """Gradient Projection with Orthogonal Forgetting.
11
12 Computes gradients from retain and forget losses using autograd.grad,
13 projects forget gradient orthogonal to retain gradient, then combines.
14 Includes gradient clipping and forget weight scheduling.
15 """
16
17 def __init__(self):
18 self.forget_weight = 0.5 # λ: initial weight for orthogo