dl-regularization

Deep Learningpytorch-visionrigorous codebase

Description

DL Regularization Strategy Design

Research Question

Design a novel regularization strategy for deep convolutional neural networks that improves generalization (test accuracy) across different architectures and datasets.

Background

Regularization is essential for preventing overfitting and improving generalization in deep neural networks. Beyond standard weight decay (L2 penalty), many regularization techniques have been proposed:

DropBlock-inspired spatial co-activation penalty (Ghiasi et al., 2018): Penalizes local spatial co-activation in feature maps, discouraging reliance on contiguous regions — captures the core insight of DropBlock as a loss-based regularizer
Confidence penalty (Pereyra et al., 2017): Penalizes low-entropy output distributions to prevent overconfidence
Orthogonal regularization (Brock et al., 2017): Encourages weight matrices to be orthogonal, preserving gradient flow

However, these methods typically apply a fixed penalty throughout training and do not adapt to training dynamics, model architecture, or the relationship between different layer types. There is room to design regularization strategies that are more adaptive, architecture-aware, or that combine multiple complementary penalties.

What You Can Modify

The compute_regularization(model, inputs, outputs, targets, config) function (lines 155-183) in custom_reg.py. This function is called every training step and returns a scalar loss that is added to the cross-entropy loss.

You can use:

model: the full nn.Module — iterate over model.named_parameters() or model.named_modules() for weight-based penalties
inputs: [B, 3, 32, 32] — the input batch (for input-dependent regularization)
outputs: [B, num_classes] — the model logits (for output-based penalties like confidence/entropy)
targets: [B] — integer class labels
config: dict with num_classes (int), epoch (int, 0-indexed), total_epochs (int)

Design ideas:

Weight-based: L1/L2 norms, orthogonality, spectral norms, weight correlation
Output-based: entropy, confidence penalty, label smoothing effect, logit penalties
Activation-based: sparsity, diversity (requires forward hooks)
Epoch-dependent: warm-up schedules, annealing, curriculum regularization
Architecture-aware: different penalties for conv vs linear, depth-dependent scaling

Note: Standard L2 weight decay (5e-4) is already applied via the optimizer. Your regularization term is additional.

Evaluation

Metric: Best test accuracy (%, higher is better)
Architectures & datasets:
- ResNet-56 on CIFAR-100 (deep residual, 100 classes)
- VGG-16-BN on CIFAR-100 (deep non-residual with BatchNorm, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual, 10 classes) — hidden, evaluated on final submission only
Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip

Code

custom_reg.py

EditableRead-only

1"""CV Regularization Benchmark.
2
3Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST to evaluate
4regularization strategies.
5
6FIXED: Model architectures, weight initialization, data pipeline, training loop.
7EDITABLE: compute_regularization() function.
8
9Usage:
10    python custom_reg.py --arch resnet20 --dataset cifar10 --seed 42
11"""
12
13import argparse
14import math
15import os

Results

Model	Type	test acc resnet56-cifar100 ↑	test acc vgg16bn-cifar100 ↑	test acc mobilenetv2-fmnist ↑
confidence_penalty	baseline	72.660	74.310	94.900
dropblock	baseline	72.210	73.910	91.840
dropblock	baseline	71.360	1.000	94.660
dropblock	baseline	-	-	-
dropblock	baseline	72.650	1.000	94.450
dropblock	baseline	-	-	-
dropblock	baseline	72.350	72.740	94.290
dropblock	baseline	-	-	-
dropblock	baseline	-	-	-
dropblock	baseline	-	-	-
dropblock	baseline	71.730	1.000	94.570
dropblock	baseline	72.450	73.370	94.690
orthogonal_reg	baseline	73.160	74.030	94.850
anthropic/claude-opus-4.6	vanilla	72.490	1.000	94.410
deepseek-reasoner	vanilla	-	1.000	-
google/gemini-3.1-pro-preview	vanilla	75.250	76.520	95.480
openai/gpt-5.4	vanilla	72.810	1.000	94.460
qwen/qwen3.6-plus	vanilla	72.770	73.520	94.570
anthropic/claude-opus-4.6	agent	71.710	74.180	94.540
deepseek-reasoner	agent	-	72.980	94.740
google/gemini-3.1-pro-preview	agent	75.250	76.520	95.480
openai/gpt-5.4	agent	72.180	72.970	94.710
qwen/qwen3.6-plus	agent	72.770	73.520	94.570

Agent Conversations

deepseek-reasoner

7 steps