dl-regularization

Deep Learningpytorch-visionrigorous codebase

Description

DL Regularization Strategy Design

Research Question

Design a novel regularization strategy for deep convolutional neural networks that improves generalization (test accuracy) across different architectures and datasets.

Background

Regularization is essential for preventing overfitting and improving generalization in deep neural networks. Beyond standard weight decay (L2 penalty), many regularization techniques have been proposed:

  • DropBlock-inspired spatial co-activation penalty (Ghiasi et al., 2018): Penalizes local spatial co-activation in feature maps, discouraging reliance on contiguous regions — captures the core insight of DropBlock as a loss-based regularizer
  • Confidence penalty (Pereyra et al., 2017): Penalizes low-entropy output distributions to prevent overconfidence
  • Orthogonal regularization (Brock et al., 2017): Encourages weight matrices to be orthogonal, preserving gradient flow

However, these methods typically apply a fixed penalty throughout training and do not adapt to training dynamics, model architecture, or the relationship between different layer types. There is room to design regularization strategies that are more adaptive, architecture-aware, or that combine multiple complementary penalties.

What You Can Modify

The compute_regularization(model, inputs, outputs, targets, config) function (lines 155-183) in custom_reg.py. This function is called every training step and returns a scalar loss that is added to the cross-entropy loss.

You can use:

  • model: the full nn.Module — iterate over model.named_parameters() or model.named_modules() for weight-based penalties
  • inputs: [B, 3, 32, 32] — the input batch (for input-dependent regularization)
  • outputs: [B, num_classes] — the model logits (for output-based penalties like confidence/entropy)
  • targets: [B] — integer class labels
  • config: dict with num_classes (int), epoch (int, 0-indexed), total_epochs (int)

Design ideas:

  • Weight-based: L1/L2 norms, orthogonality, spectral norms, weight correlation
  • Output-based: entropy, confidence penalty, label smoothing effect, logit penalties
  • Activation-based: sparsity, diversity (requires forward hooks)
  • Epoch-dependent: warm-up schedules, annealing, curriculum regularization
  • Architecture-aware: different penalties for conv vs linear, depth-dependent scaling

Note: Standard L2 weight decay (5e-4) is already applied via the optimizer. Your regularization term is additional.

Evaluation

  • Metric: Best test accuracy (%, higher is better)
  • Architectures & datasets:
    • ResNet-56 on CIFAR-100 (deep residual, 100 classes)
    • VGG-16-BN on CIFAR-100 (deep non-residual with BatchNorm, 100 classes)
    • MobileNetV2 on FashionMNIST (lightweight inverted-residual, 10 classes) — hidden, evaluated on final submission only
  • Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
  • Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip

Code

custom_reg.py
EditableRead-only
1"""CV Regularization Benchmark.
2
3Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST to evaluate
4regularization strategies.
5
6FIXED: Model architectures, weight initialization, data pipeline, training loop.
7EDITABLE: compute_regularization() function.
8
9Usage:
10 python custom_reg.py --arch resnet20 --dataset cifar10 --seed 42
11"""
12
13import argparse
14import math
15import os

Results

ModelTypetest acc resnet56-cifar100 test acc vgg16bn-cifar100 test acc mobilenetv2-fmnist
confidence_penaltybaseline72.66074.31094.900
dropblockbaseline72.21073.91091.840
dropblockbaseline71.3601.00094.660
dropblockbaseline---
dropblockbaseline72.6501.00094.450
dropblockbaseline---
dropblockbaseline72.35072.74094.290
dropblockbaseline---
dropblockbaseline---
dropblockbaseline---
dropblockbaseline71.7301.00094.570
dropblockbaseline72.45073.37094.690
orthogonal_regbaseline73.16074.03094.850
anthropic/claude-opus-4.6vanilla72.4901.00094.410
deepseek-reasonervanilla-1.000-
google/gemini-3.1-pro-previewvanilla75.25076.52095.480
openai/gpt-5.4vanilla72.8101.00094.460
qwen/qwen3.6-plusvanilla72.77073.52094.570
anthropic/claude-opus-4.6agent71.71074.18094.540
deepseek-reasoneragent-72.98094.740
google/gemini-3.1-pro-previewagent75.25076.52095.480
openai/gpt-5.4agent72.18072.97094.710
qwen/qwen3.6-plusagent72.77073.52094.570

Agent Conversations