dl-lr-schedule
Description
DL Learning Rate Schedule Design
Research Question
Design a novel learning rate schedule for training deep convolutional neural networks that improves convergence speed and final test accuracy across different architectures and datasets.
Background
Learning rate scheduling is critical for training deep neural networks effectively. A fixed learning rate often leads to suboptimal results — too high causes instability, too low results in slow convergence. Classic schedules include:
- Step decay (He et al., 2016): Divide LR by 10 at fixed milestones (e.g., 50% and 75% of training)
- Cosine annealing (Loshchilov & Hutter, 2017): Smooth decay following a cosine curve from base_lr to 0
- Warmup + cosine (Goyal et al., 2017): Linear warmup phase followed by cosine decay, stabilizes large-batch training
However, these schedules are designed without considering architecture-specific properties (depth, residual connections, batch normalization) or dataset characteristics. There is room to design schedules that adapt to the training context.
What You Can Modify
The get_lr(epoch, total_epochs, base_lr, config) function (lines 155-178) in custom_schedule.py. This function is called once per epoch to compute the learning rate.
You can modify:
- The shape of the LR decay curve (cosine, polynomial, exponential, linear, piecewise, ...)
- Whether and how long to warm up
- The minimum/final learning rate
- Architecture-aware scheduling (config provides 'arch' and 'dataset')
- Any epoch-dependent logic (cyclic restarts, sharp transitions, plateau regions, ...)
The config dict provides: arch (str: 'resnet20', 'resnet56', 'mobilenetv2'), dataset (str: 'cifar10', 'cifar100', 'fmnist').
Evaluation
- Metric: Best test accuracy (%, higher is better)
- Architectures & datasets:
- ResNet-20 on CIFAR-10 (shallow residual, 10 classes)
- ResNet-56 on CIFAR-100 (deeper residual, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual, 10 classes) — hidden, evaluated on final submission only
- Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), 200 epochs, NO built-in scheduler
- Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip
- Weight init: Kaiming normal (fixed, not editable)
Code
1"""CV Learning Rate Schedule Benchmark.23Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST to evaluate4learning rate schedule strategies.56FIXED: Model architectures, data pipeline, training loop, optimizer.7EDITABLE: get_lr() function.89Usage:10python custom_schedule.py --arch resnet20 --dataset cifar10 --seed 4211"""1213import argparse14import math15import os
Results
| Model | Type | test acc resnet20-cifar10 ↑ | test acc resnet56-cifar100 ↑ | test acc mobilenetv2-fmnist ↑ |
|---|---|---|---|---|
| cosine | baseline | 93.030 | 71.070 | 94.700 |
| one_cycle | baseline | 92.310 | 71.570 | 93.930 |
| warmup_cosine | baseline | 92.710 | 72.430 | 94.830 |
| anthropic/claude-opus-4.6 | vanilla | 92.680 | 72.970 | 94.610 |
| deepseek-reasoner | vanilla | 92.900 | 72.630 | 94.550 |
| google/gemini-3.1-pro-preview | vanilla | 92.810 | 73.580 | 94.910 |
| openai/gpt-5.4 | vanilla | 92.580 | 72.490 | 94.690 |
| qwen/qwen3.6-plus | vanilla | 92.840 | 72.030 | 94.240 |
| anthropic/claude-opus-4.6 | agent | 92.890 | 73.110 | 94.430 |
| deepseek-reasoner | agent | 92.900 | 72.630 | 94.550 |
| google/gemini-3.1-pro-preview | agent | 92.820 | 73.500 | 95.130 |
| openai/gpt-5.4 | agent | 92.960 | 73.330 | 94.650 |
| qwen/qwen3.6-plus | agent | 92.980 | 73.010 | 94.240 |