dl-regularization
Description
DL Regularization Strategy Design
Research Question
Design a novel regularization strategy for deep convolutional neural networks that improves generalization (test accuracy) across different architectures and datasets.
Background
Regularization is essential for preventing overfitting and improving generalization in deep neural networks. Beyond standard weight decay (L2 penalty), many regularization techniques have been proposed:
- DropBlock-inspired spatial co-activation penalty (Ghiasi et al., 2018): Penalizes local spatial co-activation in feature maps, discouraging reliance on contiguous regions — captures the core insight of DropBlock as a loss-based regularizer
- Confidence penalty (Pereyra et al., 2017): Penalizes low-entropy output distributions to prevent overconfidence
- Orthogonal regularization (Brock et al., 2017): Encourages weight matrices to be orthogonal, preserving gradient flow
However, these methods typically apply a fixed penalty throughout training and do not adapt to training dynamics, model architecture, or the relationship between different layer types. There is room to design regularization strategies that are more adaptive, architecture-aware, or that combine multiple complementary penalties.
What You Can Modify
The compute_regularization(model, inputs, outputs, targets, config) function (lines 155-183) in custom_reg.py. This function is called every training step and returns a scalar loss that is added to the cross-entropy loss.
You can use:
- model: the full
nn.Module— iterate overmodel.named_parameters()ormodel.named_modules()for weight-based penalties - inputs:
[B, 3, 32, 32]— the input batch (for input-dependent regularization) - outputs:
[B, num_classes]— the model logits (for output-based penalties like confidence/entropy) - targets:
[B]— integer class labels - config: dict with
num_classes(int),epoch(int, 0-indexed),total_epochs(int)
Design ideas:
- Weight-based: L1/L2 norms, orthogonality, spectral norms, weight correlation
- Output-based: entropy, confidence penalty, label smoothing effect, logit penalties
- Activation-based: sparsity, diversity (requires forward hooks)
- Epoch-dependent: warm-up schedules, annealing, curriculum regularization
- Architecture-aware: different penalties for conv vs linear, depth-dependent scaling
Note: Standard L2 weight decay (5e-4) is already applied via the optimizer. Your regularization term is additional.
Evaluation
- Metric: Best test accuracy (%, higher is better)
- Architectures & datasets:
- ResNet-56 on CIFAR-100 (deep residual, 100 classes)
- VGG-16-BN on CIFAR-100 (deep non-residual with BatchNorm, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual, 10 classes) — hidden, evaluated on final submission only
- Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
- Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip
Code
1"""CV Regularization Benchmark.23Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST to evaluate4regularization strategies.56FIXED: Model architectures, weight initialization, data pipeline, training loop.7EDITABLE: compute_regularization() function.89Usage:10python custom_reg.py --arch resnet20 --dataset cifar10 --seed 4211"""1213import argparse14import math15import os
Results
| Model | Type | test acc resnet56-cifar100 ↑ | test acc vgg16bn-cifar100 ↑ | test acc mobilenetv2-fmnist ↑ |
|---|---|---|---|---|
| confidence_penalty | baseline | 72.660 | 74.310 | 94.900 |
| dropblock | baseline | 72.210 | 73.910 | 91.840 |
| dropblock | baseline | 71.360 | 1.000 | 94.660 |
| dropblock | baseline | - | - | - |
| dropblock | baseline | 72.650 | 1.000 | 94.450 |
| dropblock | baseline | - | - | - |
| dropblock | baseline | 72.350 | 72.740 | 94.290 |
| dropblock | baseline | - | - | - |
| dropblock | baseline | - | - | - |
| dropblock | baseline | - | - | - |
| dropblock | baseline | 71.730 | 1.000 | 94.570 |
| dropblock | baseline | 72.450 | 73.370 | 94.690 |
| orthogonal_reg | baseline | 73.160 | 74.030 | 94.850 |
| anthropic/claude-opus-4.6 | vanilla | 72.490 | 1.000 | 94.410 |
| deepseek-reasoner | vanilla | - | 1.000 | - |
| google/gemini-3.1-pro-preview | vanilla | 75.250 | 76.520 | 95.480 |
| openai/gpt-5.4 | vanilla | 72.810 | 1.000 | 94.460 |
| qwen/qwen3.6-plus | vanilla | 72.770 | 73.520 | 94.570 |
| anthropic/claude-opus-4.6 | agent | 71.710 | 74.180 | 94.540 |
| deepseek-reasoner | agent | - | 72.980 | 94.740 |
| google/gemini-3.1-pro-preview | agent | 75.250 | 76.520 | 95.480 |
| openai/gpt-5.4 | agent | 72.180 | 72.970 | 94.710 |
| qwen/qwen3.6-plus | agent | 72.770 | 73.520 | 94.570 |