Agent Conversation: google/gemini-3.1-pro-preview
Task: optimization-variance-reduction
Variance Reduction for Stochastic Optimization
Research Question
Design an improved variance reduction strategy for stochastic gradient descent on finite-sum optimization problems. Your method should accelerate convergence compared to vanilla mini-batch SGD by reducing the variance of gradient estimates.
Background
Many machine learning problems take the form of finite-sum optimization:
min_x F(x) = (1/n) * sum_{i=1}^{n} f_i(x)
Standard SGD uses a stochastic gradient from a random mini-batch, which has variance proportional to 1/b (where b is the batch size). Variance reduction methods use auxiliary information (snapshots, recursive corrections, momentum) to reduce this variance, enabling faster convergence -- often achieving linear convergence rates for strongly convex problems where SGD only achieves sublinear rates.
Key methods in this area include SVRG (periodic full gradient + control variate), SARAH (recursive gradient correction), STORM (momentum-based online variance reduction), SPIDER, and PAGE.
Task
Modify the VarianceReductionOptimizer class in custom_vr.py (lines 286-370). You must implement:
__init__(self, model, lr, l2_reg, loss_type, n_train, batch_size, device): Initialize any state needed for variance reduction (snapshot parameters, running gradient estimates, buffers, etc.)train_one_epoch(self, X_train, y_train): Train for one epoch over the data, returning a dict with at least'avg_loss'(and optionally'full_grad_count'if you use full gradient computations)
The default implementation is vanilla mini-batch SGD. Your goal is to design a variance reduction mechanism that improves convergence.
Interface
Available helper functions (FIXED, use these for gradient computation):
compute_full_gradient(model, X_train, y_train, loss_type, l2_reg, device)
# -> returns list of gradient tensors (one per parameter)
compute_stochastic_gradient(model, X_batch, y_batch, loss_type, l2_reg)
# -> returns list of gradient tensors for a mini-batch
compute_loss_on_batch(model, X_batch, y_batch, loss_type, l2_reg)
# -> returns scalar loss tensor
Constraints
- You may call
compute_full_gradientat most once per epoch - Parameter updates must use
p.data.add_(...)or similar in-place operations - Must work across all three problems with the same code
- The learning rate (
self.lr) and L2 regularization (self.l2_reg) are fixed - Do not modify the model architecture, loss function, or evaluation code
Evaluation
- Problems:
logistic: L2-regularized multinomial logistic regression on MNIST (convex, n=60K, 20 epochs)mlp: 2-layer MLP on CIFAR-10 (non-convex, n=50K, 40 epochs)conditioned: L2-regularized linear regression on synthetic ill-conditioned data (strongly convex, kappa=100, n=10K, 30 epochs)
- Metrics:
best_test_accuracy(logistic, mlp; higher is better) andbest_test_mse(conditioned; lowe
Tool Input
1{
2 "filename": "opt-vr-bench/custom_vr.py",
3 "start_line": 311,
4 "content": "class VarianceReductionOptimizer:\n def __init__(self, model: nn.Module, lr: float, l2_reg: float,\n loss_type: str, n_train: int, batch_size: int,\n device: torch.device):\n self.model = model\n self.lr = lr\n self.l2_reg = l2_reg\n self.loss_type = loss_type\n self.n_train = n_train\n self.batch_size = batch_size\n self.device = device\n self.params = list(model.parameters())\n self.m = [torch.zeros_like(p) for p in self.params]\n self.momentum = 0.9\n\n def train_one_epoch(self, X_train: torch.Tensor,\n y_train: torch.Tensor) -> dict:\n self.model.train()\n n = X_train.size(0)\n\n # Compute full gradient at snapshot\n snapshot_params = [p.data.clone() for p in self.params]\n full_grad = compute_full_gradient(\n self.model, X_train, y_train, self.loss_type,\n self.l2_reg, self.device\n )\n\n indices = torch.randperm(n)\n total_loss = 0.0\n n_batches = 0\n\n for start in range(0, n, self.batch_size):\n end = min(start + self.batch_size, n)\n idx = indices[start:end]\n Xb = X_train[idx].to(self.device)\n yb = y_train[idx].to(self.device)\n\n # grad at current params\n self.model.zero_grad()\n loss = compute_loss_on_batch(self.model, Xb, yb, self.loss_type, self.l2_reg)\n loss.backward()\n curr_grad = [p.grad.clone() for p in self.params]\n\n # replace params with snapshot to get grad at snapshot\n curr_params = [p.data.clone() for p in self.params]\n for p, sp in zip(self.params, snapshot_params):\n p.data.copy_(sp)\n\n self.model.zero_grad()\n with torch.no_grad():\n # For grad computation only. But we need to backward.\n pass\n loss_snap = compute_loss_on_batch(self.model, Xb, yb, self.loss_type, self.l2_reg)\n loss_snap.backward()\n snap_grad = [p.grad.clone() for p in self.params]\n\n # restore current params\n for p, cp in zip(self.params, curr_params):\n p.data.copy_(cp)\n\n # Update with SVRG estimator + momentum\n with torch.no_grad():\n for i, p in enumerate(self.params):\n # v_t = curr_grad - snap_grad + full_grad\n v = curr_grad[i] - snap_grad[i] + full_grad[i]\n self.m[i].mul_(self.momentum).add_(v, alpha=1.0 - self.momentum)\n p.data.add_(self.m[i], alpha=-self.lr)\n\n total_loss += loss.item()\n n_batches += 1\n\n return {\"avg_loss\": total_loss / max(n_batches, 1), \"full_grad_count\": 1}\n",
5 "op": "replace",
6 "end_line": 370,
7 "after_line": 285
8}OK: Replaced lines 311..370 with 72 line(s) in opt-vr-bench/custom_vr.py. Editable range: 286–382. [Current file: opt-vr-bench/custom_vr.py | editable: 286–382 | total: 501 lines] 283| 284| 285| # ============================================================================ 286 # EDITABLE -- Variance Reduction Strategy (lines 286-370) 287 # ============================================================================ 288 # Design a variance reduction mechanism for stochastic gradient computation. 289 # You may modify ONLY this section. 290 # 291 # Interface contract: 292 # - VarianceReductionOptimizer.__init__(model, lr, l2_reg, loss_type, n_train, batch_size, device) 293 # - VarianceReductionOptimizer.train_one_epoch(X_train, y_train) 294 # -> trains for one epoch, returns dict with 'avg_loss' 295 # 296 # Available helper functions (FIXED, defined above): 297 # - compute_full_gradient(model, X_train, y_train, loss_typ
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ## Seed 42 ### logistic (scripts/logistic.sh) Training progress (last 5 epochs): TRAIN_METRICS: epoch=16 avg_loss=0.240150 time=1.06s grad_comps=15008 TRAIN_METRICS: epoch=17 avg_loss=0.239571 time=1.08s grad_comps=15946 TRAIN_METRICS: epoch=18 avg_loss=0.239032 time=1.07s grad_comps=16884 TRAIN_METRICS: epoch=19 avg_loss=0.238466 time=1.06s grad_comps=17822 TRAIN_METRICS: epoch=20 avg_loss=0.237925 time=1.14s grad_comps=18760 Evaluation progress (last 5 evals): EVAL_METRICS: epoch=16 test_loss=0.265884 test_accuracy=92.540000 best_test_accuracy=92.570000 EVAL_METRICS: epoch=17 test_loss=0.265964 test_accuracy=92.550000 best_test_accuracy=92.570000 EVAL_METRICS: epoch=18 test_loss=0.266051 test_accuracy=92.550000 best_test_accuracy=92.570000 EVAL_METRICS: epoch=19 test_loss=0.266181 test_accuracy=92.570000 best_test_accuracy=92.570000 EVAL_METRICS: epoch=20 test_loss=0.266284 test_accuracy=92.570000 best_test_accu
Tool Input
1{
2 "filename": "opt-vr-bench/custom_vr.py",
3 "end_line": 382,
4 "op": "replace",
5 "start_line": 311,
6 "after_line": 285,
7 "content": "class VarianceReductionOptimizer:\n def __init__(self, model: nn.Module, lr: float, l2_reg: float,\n loss_type: str, n_train: int, batch_size: int,\n device: torch.device):\n self.model = model\n self.lr = lr\n self.l2_reg = l2_reg\n self.loss_type = loss_type\n self.n_train = n_train\n self.batch_size = batch_size\n self.device = device\n self.params = list(model.parameters())\n \n n_steps_per_epoch = max(1, n_train // batch_size)\n self.momentum = 1.0 - 1.0 / math.sqrt(n_steps_per_epoch)\n self.d = None\n self.prev_params = None\n\n def train_one_epoch(self, X_train: torch.Tensor,\n y_train: torch.Tensor) -> dict:\n self.model.train()\n n = X_train.size(0)\n a = self.momentum\n \n # Periodic reset of STORM estimator with exact full gradient\n self.d = compute_full_gradient(\n self.model, X_train, y_train, self.loss_type,\n self.l2_reg, self.device\n )\n self.prev_params = [p.data.clone() for p in self.params]\n\n indices = torch.randperm(n)\n total_loss = 0.0\n n_batches = 0\n\n for start in range(0, n, self.batch_size):\n end = min(start + self.batch_size, n)\n idx = indices[start:end]\n Xb = X_train[idx].to(self.device)\n yb = y_train[idx].to(self.device)\n\n # Grad at current params\n self.model.zero_grad()\n loss = compute_loss_on_batch(self.model, Xb, yb, self.loss_type, self.l2_reg)\n loss.backward()\n curr_grad = [p.grad.clone() for p in self.params]\n\n # Grad at previous params (same batch)\n curr_params = [p.data.clone() for p in self.params]\n for p, pp in zip(self.params, self.prev_params):\n p.data.copy_(pp)\n\n self.model.zero_grad()\n with torch.no_grad(): pass \n loss_prev = compute_loss_on_batch(self.model, Xb, yb, self.loss_type, self.l2_reg)\n loss_prev.backward()\n prev_grad = [p.grad.clone() for p in self.params]\n\n # Restore current params\n for p, cp in zip(self.params, curr_params):\n p.data.copy_(cp)\n\n # STORM update\n with torch.no_grad():\n for i, p in enumerate(self.params):\n # d_t = g_t + a * (d_{t-1} - g'_{t-1})\n self.d[i] = curr_grad[i] + a * (self.d[i] - prev_grad[i])\n p.data.add_(self.d[i], alpha=-self.lr)\n\n # Save prev_params for NEXT iteration\n self.prev_params = curr_params\n\n total_loss += loss.item()\n n_batches += 1\n\n return {\"avg_loss\": total_loss / max(n_batches, 1), \"full_grad_count\": 1}\n"
8}OK: Replaced lines 311..382 with 76 line(s) in opt-vr-bench/custom_vr.py. Editable range: 286–386. [Current file: opt-vr-bench/custom_vr.py | editable: 286–386 | total: 505 lines] 283| 284| 285| # ============================================================================ 286 # EDITABLE -- Variance Reduction Strategy (lines 286-370) 287 # ============================================================================ 288 # Design a variance reduction mechanism for stochastic gradient computation. 289 # You may modify ONLY this section. 290 # 291 # Interface contract: 292 # - VarianceReductionOptimizer.__init__(model, lr, l2_reg, loss_type, n_train, batch_size, device) 293 # - VarianceReductionOptimizer.train_one_epoch(X_train, y_train) 294 # -> trains for one epoch, returns dict with 'avg_loss' 295 # 296 # Available helper functions (FIXED, defined above): 297 # - compute_full_gradient(model, X_train, y_train, loss_typ
Tool Input
1{
2 "content": "class VarianceReductionOptimizer:\n \"\"\"Accelerated SARAH (SARAH with Heavy Ball Momentum).\n \n Combines the SARAH recursive variance reduction estimator with\n Heavy Ball momentum to accelerate convergence across both\n convex and non-convex finite-sum problems.\n \"\"\"\n\n def __init__(self, model: nn.Module, lr: float, l2_reg: float,\n loss_type: str, n_train: int, batch_size: int,\n device: torch.device):\n self.model = model\n self.lr = lr\n self.l2_reg = l2_reg\n self.loss_type = loss_type\n self.n_train = n_train\n self.batch_size = batch_size\n self.device = device\n self.params = list(model.parameters())\n self.v = None\n self.prev_params = None\n self.momentum_buffer = None\n self.beta = 0.9\n\n def _save_params(self):\n return [p.data.clone() for p in self.params]\n\n def _load_params(self, saved):\n for p, s in zip(self.params, saved):\n p.data.copy_(s)\n\n def train_one_epoch(self, X_train: torch.Tensor,\n y_train: torch.Tensor) -> dict:\n self.model.train()\n n = X_train.size(0)\n\n # 1. Full gradient at the beginning of the epoch\n full_grad = compute_full_gradient(\n self.model, X_train, y_train, self.loss_type,\n self.l2_reg, self.device\n )\n self.v = [fg.clone() for fg in full_grad]\n\n if self.momentum_buffer is None:\n self.momentum_buffer = [torch.zeros_like(p) for p in self.params]\n\n indices = torch.randperm(n)\n total_loss = 0.0\n n_batches = 0\n\n self.prev_params = self._save_params()\n\n for start in range(0, n, self.batch_size):\n end = min(start + self.batch_size, n)\n idx = indices[start:end]\n Xb = X_train[idx].to(self.device)\n yb = y_train[idx].to(self.device)\n\n # Gradient at current params\n current_params = self._save_params()\n self.model.zero_grad()\n loss = compute_loss_on_batch(self.model, Xb, yb, self.loss_type, self.l2_reg)\n loss.backward()\n g_current = [p.grad.clone() if p.grad is not None else torch.zeros_like(p) \n for p in self.params]\n\n # Gradient at previous params\n self._load_params(self.prev_params)\n self.model.zero_grad()\n prev_loss = compute_loss_on_batch(self.model, Xb, yb, self.loss_type, self.l2_reg)\n prev_loss.backward()\n g_prev = [p.grad.clone() if p.grad is not None else torch.zeros_like(p) \n for p in self.params]\n\n # Restore current for update\n self._load_params(current_params)\n self.prev_params = self._save_params()\n\n # Variance-reduced gradient update\n with torch.no_grad():\n for i, p in enumerate(self.params):\n # SARAH estimator\n self.v[i] = g_current[i] - g_prev[i] + self.v[i]\n # Momentum\n self.momentum_buffer[i] = self.beta * self.momentum_buffer[i] + self.v[i]\n # Step\n p.data.add_(self.momentum_buffer[i], alpha=-self.lr)\n\n total_loss += loss.item()\n n_batches += 1\n\n return {\"avg_loss\": total_loss / max(n_batches, 1), \"full_grad_count\": 1}",
3 "op": "replace",
4 "filename": "opt-vr-bench/custom_vr.py",
5 "end_line": 369,
6 "after_line": 1,
7 "start_line": 311
8}OK: Replaced lines 311..369 with 92 line(s) in opt-vr-bench/custom_vr.py. Editable range: 286–403. [Current file: opt-vr-bench/custom_vr.py | editable: 286–403 | total: 522 lines] 283| 284| 285| # ============================================================================ 286 # EDITABLE -- Variance Reduction Strategy (lines 286-370) 287 # ============================================================================ 288 # Design a variance reduction mechanism for stochastic gradient computation. 289 # You may modify ONLY this section. 290 # 291 # Interface contract: 292 # - VarianceReductionOptimizer.__init__(model, lr, l2_reg, loss_type, n_train, batch_size, device) 293 # - VarianceReductionOptimizer.train_one_epoch(X_train, y_train) 294 # -> trains for one epoch, returns dict with 'avg_loss' 295 # 296 # Available helper functions (FIXED, defined above): 297 # - compute_full_gradient(model, X_train, y_train, loss_typ
[Test #1] (2 tests remaining; call submit(n=1) to submit this result) ## Seed 42 ### logistic (scripts/logistic.sh) Training progress (last 5 epochs): TRAIN_METRICS: epoch=16 avg_loss=89.959705 time=1.09s grad_comps=15008 TRAIN_METRICS: epoch=17 avg_loss=102.254029 time=1.07s grad_comps=15946 TRAIN_METRICS: epoch=18 avg_loss=93.438592 time=1.06s grad_comps=16884 TRAIN_METRICS: epoch=19 avg_loss=93.777836 time=1.13s grad_comps=17822 TRAIN_METRICS: epoch=20 avg_loss=83.304029 time=1.02s grad_comps=18760 Evaluation progress (last 5 evals): EVAL_METRICS: epoch=16 test_loss=99.825355 test_accuracy=75.400000 best_test_accuracy=85.480000 EVAL_METRICS: epoch=17 test_loss=101.720771 test_accuracy=75.720000 best_test_accuracy=85.480000 EVAL_METRICS: epoch=18 test_loss=81.406343 test_accuracy=81.040000 best_test_accuracy=85.480000 EVAL_METRICS: epoch=19 test_loss=77.526586 test_accuracy=78.930000 best_test_accuracy=85.480000 EVAL_METRICS: epoch=20 test_loss=62.801315 test_accuracy=83.160000 be
Tool Input
1{
2 "after_line": 1,
3 "op": "replace",
4 "filename": "opt-vr-bench/custom_vr.py",
5 "content": "class VarianceReductionOptimizer:\n \"\"\"SARAH with mini-epoch restarts and adaptive step size.\"\"\"\n\n def __init__(self, model: nn.Module, lr: float, l2_reg: float,\n loss_type: str, n_train: int, batch_size: int,\n device: torch.device):\n self.model = model\n self.lr = lr\n self.l2_reg = l2_reg\n self.loss_type = loss_type\n self.n_train = n_train\n self.batch_size = batch_size\n self.device = device\n self.params = list(model.parameters())\n self.v = None\n self.prev_params = None\n self.eta_max = None\n\n def _save_params(self):\n return [p.data.clone() for p in self.params]\n\n def _load_params(self, saved):\n for p, s in zip(self.params, saved):\n p.data.copy_(s)\n\n def _grad_norm(self, grads):\n return math.sqrt(sum(g.pow(2).sum().item() for g in grads))\n\n def train_one_epoch(self, X_train: torch.Tensor,\n y_train: torch.Tensor) -> dict:\n self.model.train()\n n = X_train.size(0)\n\n # Full gradient\n full_grad = compute_full_gradient(\n self.model, X_train, y_train, self.loss_type,\n self.l2_reg, self.device\n )\n self.v = [fg.clone() for fg in full_grad]\n \n gnorm = self._grad_norm(self.v)\n pnorm = math.sqrt(sum(p.data.pow(2).sum().item() for p in self.params)) + 1e-8\n \n adaptive_step = 0.01 * pnorm / (gnorm + 1e-8)\n if self.loss_type == 'mse':\n if self.eta_max is None:\n self.eta_max = adaptive_step\n else:\n self.eta_max = min(2.0 * self.eta_max, adaptive_step)\n effective_lr = min(self.lr, adaptive_step, self.eta_max)\n else:\n effective_lr = min(self.lr, adaptive_step)\n\n indices = torch.randperm(n)\n total_loss = 0.0\n n_batches = 0\n\n self.prev_params = self._save_params()\n\n for start in range(0, n, self.batch_size):\n end = min(start + self.batch_size, n)\n idx = indices[start:end]\n Xb = X_train[idx].to(self.device)\n yb = y_train[idx].to(self.device)\n\n current_params = self._save_params()\n\n # grad at curr\n self.model.zero_grad()\n loss = compute_loss_on_batch(self.model, Xb, yb, self.loss_type, self.l2_reg)\n loss.backward()\n g_curr = [p.grad.clone() if p.grad is not None else torch.zeros_like(p) for p in self.params]\n\n if start > 0:\n # grad at prev\n self._load_params(self.prev_params)\n self.model.zero_grad()\n prev_loss = compute_loss_on_batch(self.model, Xb, yb, self.loss_type, self.l2_reg)\n prev_loss.backward()\n g_prev = [p.grad.clone() if p.grad is not None else torch.zeros_like(p) for p in self.params]\n\n # update v\n for i in range(len(self.params)):\n self.v[i] = g_curr[i] - g_prev[i] + self.v[i]\n \n # Restore current params\n self._load_params(current_params)\n self.prev_params = self._save_params()\n\n with torch.no_grad():\n for i, p in enumerate(self.params):\n p.data.add_(self.v[i], alpha=-effective_lr)\n\n total_loss += loss.item()\n n_batches += 1\n\n return {\"avg_loss\": total_loss / max(n_batches, 1), \"full_grad_count\": 1}",
6 "start_line": 311,
7 "end_line": 402
8}OK: Replaced lines 311..402 with 97 line(s) in opt-vr-bench/custom_vr.py. Editable range: 286–408. [Current file: opt-vr-bench/custom_vr.py | editable: 286–408 | total: 527 lines] 283| 284| 285| # ============================================================================ 286 # EDITABLE -- Variance Reduction Strategy (lines 286-370) 287 # ============================================================================ 288 # Design a variance reduction mechanism for stochastic gradient computation. 289 # You may modify ONLY this section. 290 # 291 # Interface contract: 292 # - VarianceReductionOptimizer.__init__(model, lr, l2_reg, loss_type, n_train, batch_size, device) 293 # - VarianceReductionOptimizer.train_one_epoch(X_train, y_train) 294 # -> trains for one epoch, returns dict with 'avg_loss' 295 # 296 # Available helper functions (FIXED, defined above): 297 # - compute_full_gradient(model, X_train, y_train, loss_typ
[Test #2] (1 test remaining; call submit(n=2) to submit this result) ### logistic (scripts/logistic.sh) [output file not found: /srv/home/bohanlyu/MLS-Bench/logs/optimization-variance-reduction/google_gemini-3.1-pro-preview_20260405_234134/20260406_011045/group_1/logistic_s42.out] ### mlp (scripts/mlp.sh) [output file not found: /srv/home/bohanlyu/MLS-Bench/logs/optimization-variance-reduction/google_gemini-3.1-pro-preview_20260405_234134/20260406_011045/group_1/mlp_s42.out] ### conditioned (scripts/conditioned.sh) [output file not found: /srv/home/bohanlyu/MLS-Bench/logs/optimization-variance-reduction/google_gemini-3.1-pro-preview_20260405_234134/20260406_011045/group_1/conditioned_s42.out]
Undo complete: Restored: opt-vr-bench/custom_vr.py
Tool Input
1{
2 "content": "class VarianceReductionOptimizer:\n \"\"\"SPIDER/PAGE robust estimator.\"\"\"\n\n def __init__(self, model: nn.Module, lr: float, l2_reg: float,\n loss_type: str, n_train: int, batch_size: int,\n device: torch.device):\n self.model = model\n self.lr = lr\n self.l2_reg = l2_reg\n self.loss_type = loss_type\n self.n_train = n_train\n self.batch_size = batch_size\n self.device = device\n self.params = list(model.parameters())\n \n self.v = None\n self.prev_params = None\n \n self.n_steps_per_epoch = max(1, n_train // batch_size)\n self.momentum = 0.9\n\n def _save_params(self):\n return [p.data.clone() for p in self.params]\n\n def _load_params(self, saved):\n for p, s in zip(self.params, saved):\n p.data.copy_(s)\n\n def train_one_epoch(self, X_train: torch.Tensor,\n y_train: torch.Tensor) -> dict:\n self.model.train()\n n = X_train.size(0)\n\n compute_full = False\n if self.v is None:\n compute_full = True\n \n full_grad_count = 0\n if compute_full:\n full_grad = compute_full_gradient(\n self.model, X_train, y_train, self.loss_type,\n self.l2_reg, self.device\n )\n self.v = [fg.clone() for fg in full_grad]\n self.prev_params = self._save_params()\n full_grad_count = 1\n\n indices = torch.randperm(n)\n total_loss = 0.0\n n_batches = 0\n\n a = 1.0 - 1.0 / math.sqrt(self.n_steps_per_epoch)\n\n for start in range(0, n, self.batch_size):\n end = min(start + self.batch_size, n)\n idx = indices[start:end]\n Xb = X_train[idx].to(self.device)\n yb = y_train[idx].to(self.device)\n\n current_params = self._save_params()\n\n # g_curr\n self.model.zero_grad()\n loss = compute_loss_on_batch(self.model, Xb, yb, self.loss_type, self.l2_reg)\n loss.backward()\n g_curr = [p.grad.clone() if p.grad is not None else torch.zeros_like(p) for p in self.params]\n\n # g_prev\n self._load_params(self.prev_params)\n self.model.zero_grad()\n prev_loss = compute_loss_on_batch(self.model, Xb, yb, self.loss_type, self.l2_reg)\n prev_loss.backward()\n g_prev = [p.grad.clone() if p.grad is not None else torch.zeros_like(p) for p in self.params]\n\n self._load_params(current_params)\n self.prev_params = self._save_params()\n\n with torch.no_grad():\n for i, p in enumerate(self.params):\n # STORM estimator\n self.v[i] = g_curr[i] + a * (self.v[i] - g_prev[i])\n \n # Nesterov/Momentum-like step\n p.data.add_(self.v[i], alpha=-self.lr)\n\n total_loss += loss.item()\n n_batches += 1\n\n return {\"avg_loss\": total_loss / max(n_batches, 1), \"full_grad_count\": full_grad_count}",
3 "start_line": 311,
4 "filename": "opt-vr-bench/custom_vr.py",
5 "end_line": 403,
6 "after_line": 1,
7 "op": "replace"
8}OK: Replaced lines 311..403 with 89 line(s) in opt-vr-bench/custom_vr.py. Editable range: 286–399. [Current file: opt-vr-bench/custom_vr.py | editable: 286–399 | total: 518 lines] 283| 284| 285| # ============================================================================ 286 # EDITABLE -- Variance Reduction Strategy (lines 286-370) 287 # ============================================================================ 288 # Design a variance reduction mechanism for stochastic gradient computation. 289 # You may modify ONLY this section. 290 # 291 # Interface contract: 292 # - VarianceReductionOptimizer.__init__(model, lr, l2_reg, loss_type, n_train, batch_size, device) 293 # - VarianceReductionOptimizer.train_one_epoch(X_train, y_train) 294 # -> trains for one epoch, returns dict with 'avg_loss' 295 # 296 # Available helper functions (FIXED, defined above): 297 # - compute_full_gradient(model, X_train, y_train, loss_typ