Agent Conversation: deepseek-reasoner
Task: dl-lr-schedule
DL Learning Rate Schedule Design
Research Question
Design a novel learning rate schedule for training deep convolutional neural networks that improves convergence speed and final test accuracy across different architectures and datasets.
Background
Learning rate scheduling is critical for training deep neural networks effectively. A fixed learning rate often leads to suboptimal results — too high causes instability, too low results in slow convergence. Classic schedules include:
- Step decay (He et al., 2016): Divide LR by 10 at fixed milestones (e.g., 50% and 75% of training)
- Cosine annealing (Loshchilov & Hutter, 2017): Smooth decay following a cosine curve from base_lr to 0
- Warmup + cosine (Goyal et al., 2017): Linear warmup phase followed by cosine decay, stabilizes large-batch training
However, these schedules are designed without considering architecture-specific properties (depth, residual connections, batch normalization) or dataset characteristics. There is room to design schedules that adapt to the training context.
What You Can Modify
The get_lr(epoch, total_epochs, base_lr, config) function (lines 155-178) in custom_schedule.py. This function is called once per epoch to compute the learning rate.
You can modify:
- The shape of the LR decay curve (cosine, polynomial, exponential, linear, piecewise, ...)
- Whether and how long to warm up
- The minimum/final learning rate
- Architecture-aware scheduling (config provides 'arch' and 'dataset')
- Any epoch-dependent logic (cyclic restarts, sharp transitions, plateau regions, ...)
The config dict provides: arch (str: 'resnet20', 'resnet56', 'mobilenetv2'), dataset (str: 'cifar10', 'cifar100', 'fmnist').
Evaluation
- Metric: Best test accuracy (%, higher is better)
- Architectures & datasets:
- ResNet-20 on CIFAR-10 (shallow residual, 10 classes)
- ResNet-56 on CIFAR-100 (deeper residual, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual, 10 classes) — hidden, evaluated on final submission only
- Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), 200 epochs, NO built-in scheduler
- Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip
- Weight init: Kaiming normal (fixed, not editable)
pytorch-vision/custom_schedule.py [EDITABLE — lines 246–269 only]
1: """CV Learning Rate Schedule Benchmark.
2:
3: Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST to evaluate
4: learning rate schedule strategies.
5:
6: FIXED: Model architectures, data pipeline, training loop, optimizer.
7: EDITABLE: get_lr() function.
8:
9: Usage:
10: python custom_schedule.py --arch resnet20 --dataset cifar10 --seed 42
11: """
12:
13: import argparse
14: import math
15: import os
16: import time
17:
18: import torch
19: import torch.nn as nn
20: import torch.Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_schedule.py",
4 "start_line": 246,
5 "end_line": 269,
6 "content": "def get_lr(epoch, total_epochs, base_lr, config):\n \"\"\"Compute learning rate for the given epoch.\n\n Called once per epoch to set the learning rate for all parameter groups.\n\n Args:\n epoch: current epoch (0-indexed, ranges from 0 to total_epochs-1)\n total_epochs: total number of training epochs\n base_lr: initial learning rate (from --lr flag, default 0.1)\n config: dict with keys:\n - arch: str ('resnet20', 'resnet56', 'vgg16bn', 'mobilenetv2')\n - dataset: str ('cifar10', 'cifar100', 'fmnist')\n\n Returns:\n float: learning rate to use for this epoch\n\n Design considerations:\n - Warmup phase to stabilize early training\n - Decay shape (step, cosine, polynomial, exponential, ...)\n - Final learning rate (decay to zero vs small constant)\n - Architecture/dataset-aware scheduling\n - Interaction with momentum and weight decay\n \"\"\"\n # Architecture-aware parameters\n arch = config['arch']\n dataset = config['dataset']\n \n # Warmup epochs: deeper networks need longer warmup\n if arch == 'resnet20':\n warmup_epochs = 5\n plateau_epochs = 20 # epochs at high LR after warmup\n decay_type = 'cosine'\n elif arch == 'resnet56':\n warmup_epochs = 10\n plateau_epochs = 30\n decay_type = 'cosine'\n elif arch == 'mobilenetv2':\n warmup_epochs = 3\n plateau_epochs = 15\n decay_type = 'cosine'\n else: # vgg16bn\n warmup_epochs = 7\n plateau_epochs = 25\n decay_type = 'cosine'\n \n # Adjust based on dataset difficulty\n if dataset == 'cifar100':\n warmup_epochs = int(warmup_epochs * 1.2) # longer warmup for more classes\n plateau_epochs = int(plateau_epochs * 1.1)\n elif dataset == 'fmnist':\n # FashionMNIST is simpler, shorter warmup\n warmup_epochs = max(2, warmup_epochs // 2)\n plateau_epochs = max(10, plateau_epochs // 2)\n \n # Ensure warmup + plateau doesn't exceed total epochs\n warmup_epochs = min(warmup_epochs, total_epochs // 4)\n plateau_epochs = min(plateau_epochs, total_epochs - warmup_epochs - 10) # leave room for decay\n \n # Phase determination\n if epoch < warmup_epochs:\n # Linear warmup from base_lr/10 to base_lr\n return base_lr * (epoch + 1) / warmup_epochs\n elif epoch < warmup_epochs + plateau_epochs:\n # Plateau at base_lr\n return base_lr\n else:\n # Decay phase\n decay_start_epoch = warmup_epochs + plateau_epochs\n decay_length = total_epochs - decay_start_epoch\n progress = (epoch - decay_start_epoch) / decay_length\n \n # Cosine decay to base_lr/100\n final_lr = base_lr / 100.0\n decay_lr = final_lr + 0.5 * (base_lr - final_lr) * (1 + math.cos(math.pi * progress))\n \n # Add small cyclic perturbation (1% magnitude) to help escape local minima\n cycle = math.sin(2 * math.pi * 3 * progress) * 0.01 * decay_lr\n return decay_lr + cycle"
7}OK: Replaced lines 246..269 with 78 line(s) in pytorch-vision/custom_schedule.py. Editable range: 246–323.
[Current file: pytorch-vision/custom_schedule.py | editable: 246–323 | total: 490 lines]
243| # ============================================================================
244|
245| # -- EDITABLE REGION START (lines 246-269) ------------------------------------
246 def get_lr(epoch, total_epochs, base_lr, config):
247 """Compute learning rate for the given epoch.
248
249 Called once per epoch to set the learning rate for all parameter groups.
250
251 Args:
252 epoch: current epoch (0-indexed, ranges from 0 to total_epochs-1)
253 total_epochs: total number of training epochs
254 base_lr: initial learning rate (from --lr flag, default 0.1)
255 config: dict with keys:
256 - arch: str ('resnet20', 'resnet56', 'vgg16bn', 'mobilenetv2')
257 - dataset: str ('c
[Test #1] (2 tests remaining; call submit(n=N) to choose which test result to submit as final) ### resnet20-cifar10 (scripts/resnet20_cifar10.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.1353 train_acc=95.30 test_loss=0.3462 test_acc=89.52 lr=0.013946 TRAIN_METRICS: epoch=170 train_loss=0.0891 train_acc=97.06 test_loss=0.2942 test_acc=90.92 lr=0.008486 TRAIN_METRICS: epoch=180 train_loss=0.0470 train_acc=98.63 test_loss=0.2709 test_acc=92.03 lr=0.004442 TRAIN_METRICS: epoch=190 train_loss=0.0265 train_acc=99.39 test_loss=0.2693 test_acc=92.36 lr=0.001944 TRAIN_METRICS: epoch=200 train_loss=0.0192 train_acc=99.61 test_loss=0.2748 test_acc=92.90 lr=0.001007 Final metrics (resnet20-cifar10): test_acc_resnet20-cifar10=92.90 ### resnet56-cifar100 (scripts/resnet56_cifar100.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.5476 train_acc=82.91 test_loss=1.3093 test_acc=66.49 lr=0.017295 TRAIN_METRICS: epoch=170 train_loss=0.3733 train_
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_schedule.py",
4 "start_line": 246,
5 "end_line": 323,
6 "content": "def get_lr(epoch, total_epochs, base_lr, config):\n \"\"\"Adaptive OneCycle schedule with architecture/dataset-aware parameters.\n\n Phase 1 (warmup): cosine increase from base_lr/div_factor to base_lr.\n Phase 2 (annealing): cosine decrease from base_lr to final_lr.\n\n Parameters:\n pct_start: fraction of total epochs for warmup phase.\n Deeper networks have later peak (larger pct_start).\n div_factor: initial LR divisor, larger for more complex datasets.\n final_div: final LR divisor, larger for easier datasets.\n \"\"\"\n arch = config['arch']\n dataset = config['dataset']\n \n # Default OneCycle parameters (original)\n pct_start = 0.3\n div_factor = 25.0\n final_div = 25.0\n \n # Architecture adaptations\n if arch == 'resnet20':\n # Shallow network: peak earlier, less aggressive warmup\n pct_start = 0.25\n div_factor = 20.0\n final_div = 20.0\n elif arch == 'resnet56':\n # Deeper network: peak later, more aggressive warmup (lower initial LR)\n pct_start = 0.4\n div_factor = 30.0\n final_div = 30.0\n elif arch == 'mobilenetv2':\n # Lightweight network: peak earlier, moderate warmup\n pct_start = 0.2\n div_factor = 15.0\n final_div = 15.0\n else: # vgg16bn\n pct_start = 0.35\n div_factor = 25.0\n final_div = 25.0\n \n # Dataset adaptations\n if dataset == 'cifar100':\n # More classes: need slower warmup (larger div_factor) and smaller final LR\n div_factor *= 1.5\n final_div *= 1.5\n elif dataset == 'fmnist':\n # Simpler dataset: can use more aggressive schedule\n div_factor *= 0.8\n final_div *= 0.8\n \n # Compute LR\n min_lr = base_lr / div_factor\n final_lr = base_lr / final_div\n \n progress = epoch / max(total_epochs - 1, 1)\n \n if progress <= pct_start:\n # Warmup phase: cosine from min_lr to base_lr\n t = progress / pct_start\n return min_lr + (base_lr - min_lr) * 0.5 * (1 + math.cos(math.pi * (1 - t)))\n else:\n # Anneal phase: cosine from base_lr to final_lr\n t = (progress - pct_start) / (1 - pct_start)\n return final_lr + (base_lr - final_lr) * 0.5 * (1 + math.cos(math.pi * t))"
7}OK: Replaced lines 246..323 with 65 line(s) in pytorch-vision/custom_schedule.py. Editable range: 246–310. [Current file: pytorch-vision/custom_schedule.py | editable: 246–310 | total: 477 lines] 243| # ============================================================================ 244| 245| # -- EDITABLE REGION START (lines 246-269) ------------------------------------ 246 def get_lr(epoch, total_epochs, base_lr, config): 247 """Adaptive OneCycle schedule with architecture/dataset-aware parameters. 248 249 Phase 1 (warmup): cosine increase from base_lr/div_factor to base_lr. 250 Phase 2 (annealing): cosine decrease from base_lr to final_lr. 251 252 Parameters: 253 pct_start: fraction of total epochs for warmup phase. 254 Deeper networks have later peak (larger pct_start). 255 div_factor: initial LR divisor, larger for more complex datasets. 256 final_div: final LR divisor,
[Test #2] (1 test remaining; call submit(n=N) to choose which test result to submit as final) ### resnet20-cifar10 (scripts/resnet20_cifar10.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.1850 train_acc=93.61 test_loss=0.3772 test_acc=88.48 lr=0.020865 TRAIN_METRICS: epoch=170 train_loss=0.1391 train_acc=95.17 test_loss=0.3159 test_acc=90.18 lr=0.014160 TRAIN_METRICS: epoch=180 train_loss=0.0988 train_acc=96.68 test_loss=0.3008 test_acc=90.94 lr=0.009147 TRAIN_METRICS: epoch=190 train_loss=0.0652 train_acc=97.90 test_loss=0.3096 test_acc=91.32 lr=0.006048 TRAIN_METRICS: epoch=200 train_loss=0.0531 train_acc=98.31 test_loss=0.3074 test_acc=91.58 lr=0.005000 Final metrics (resnet20-cifar10): test_acc_resnet20-cifar10=91.69 ### resnet56-cifar100 (scripts/resnet56_cifar100.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.7407 train_acc=77.57 test_loss=1.3541 test_acc=63.98 lr=0.026890 TRAIN_METRICS: epoch=170 train_loss=0.5449 train_a
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_schedule.py",
4 "start_line": 246,
5 "end_line": 310,
6 "content": "def get_lr(epoch, total_epochs, base_lr, config):\n \"\"\"Adaptive Plateau Cosine (APC) schedule.\n \n Warmup (linear) → Plateau (constant high LR) → Cosine decay to final LR.\n \n Hyperparameters adapt to architecture depth and dataset complexity.\n \"\"\"\n arch = config['arch']\n dataset = config['dataset']\n \n # Warmup fraction of total epochs\n if arch == 'resnet20':\n warmup_ratio = 0.03\n plateau_ratio = 0.15\n elif arch == 'resnet56':\n warmup_ratio = 0.05\n plateau_ratio = 0.20\n elif arch == 'mobilenetv2':\n warmup_ratio = 0.02\n plateau_ratio = 0.10\n else: # vgg16bn\n warmup_ratio = 0.04\n plateau_ratio = 0.18\n \n # Dataset adjustments\n if dataset == 'cifar100':\n # More classes, need longer plateau for exploration\n plateau_ratio *= 1.2\n elif dataset == 'fmnist':\n # Simpler dataset, shorter plateau\n plateau_ratio *= 0.8\n \n # Ensure ratios are within reasonable bounds\n warmup_ratio = min(0.1, max(0.01, warmup_ratio))\n plateau_ratio = min(0.3, max(0.05, plateau_ratio))\n \n warmup_epochs = int(warmup_ratio * total_epochs)\n plateau_epochs = int(plateau_ratio * total_epochs)\n \n # Ensure we have at least 10 epochs for decay\n if warmup_epochs + plateau_epochs > total_epochs - 10:\n plateau_epochs = total_epochs - warmup_epochs - 10\n \n # Final LR = base_lr / 100\n final_lr = base_lr / 100.0\n \n # Phase determination\n if epoch < warmup_epochs:\n # Linear warmup from base_lr/10 to base_lr\n return base_lr * (epoch + 1) / warmup_epochs\n elif epoch < warmup_epochs + plateau_epochs:\n # Plateau at base_lr\n return base_lr\n else:\n # Cosine decay from base_lr to final_lr\n decay_start_epoch = warmup_epochs + plateau_epochs\n decay_length = total_epochs - decay_start_epoch\n progress = (epoch - decay_start_epoch) / decay_length\n return final_lr + 0.5 * (base_lr - final_lr) * (1 + math.cos(math.pi * progress))"
7}OK: Replaced lines 246..310 with 59 line(s) in pytorch-vision/custom_schedule.py. Editable range: 246–304.
[Current file: pytorch-vision/custom_schedule.py | editable: 246–304 | total: 471 lines]
246 def get_lr(epoch, total_epochs, base_lr, config):
247 """Adaptive Plateau Cosine (APC) schedule.
248
... (53 more lines) ...
302 decay_length = total_epochs - decay_start_epoch
303 progress = (epoch - decay_start_epoch) / decay_length
304 return final_lr + 0.5 * (base_lr - final_lr) * (1 + math.cos(math.pi * progress))ERROR: Test budget exhausted (3/3). You MUST call submit(n=N) to choose which test result to submit as final.
Tool Input
1{
2 "n": 1
3}[submit] Submitting result from test #1 as final. ### resnet20-cifar10 (scripts/resnet20_cifar10.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.1353 train_acc=95.30 test_loss=0.3462 test_acc=89.52 lr=0.013946 TRAIN_METRICS: epoch=170 train_loss=0.0891 train_acc=97.06 test_loss=0.2942 test_acc=90.92 lr=0.008486 TRAIN_METRICS: epoch=180 train_loss=0.0470 train_acc=98.63 test_loss=0.2709 test_acc=92.03 lr=0.004442 TRAIN_METRICS: epoch=190 train_loss=0.0265 train_acc=99.39 test_loss=0.2693 test_acc=92.36 lr=0.001944 TRAIN_METRICS: epoch=200 train_loss=0.0192 train_acc=99.61 test_loss=0.2748 test_acc=92.90 lr=0.001007 Final metrics (resnet20-cifar10): test_acc_resnet20-cifar10=92.90 ### resnet56-cifar100 (scripts/resnet56_cifar100.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.5476 train_acc=82.91 test_loss=1.3093 test_acc=66.49 lr=0.017295 TRAIN_METRICS: epoch=170 train_loss=0.3733 train_acc=88.35 test_loss=1.2388 test_acc=68.63 lr=