dl-weight-initialization

Deep Learningpytorch-visionrigorous codebase

Description

DL Weight Initialization Strategy Design

Research Question

Design a novel weight initialization strategy for deep convolutional neural networks that improves convergence speed and final test accuracy across different architectures and datasets.

Background

Weight initialization is fundamental to training deep neural networks. Poor initialization leads to vanishing/exploding gradients, slow convergence, or suboptimal generalization. Classic methods include:

  • Kaiming/He (2015): Accounts for ReLU nonlinearity, N(0, sqrt(2/fan_out))
  • Orthogonal (2014): Preserves gradient norms via orthogonal matrices
  • Fixup (2019): Scales the last conv in each residual block by L^(-0.5) where L is the number of blocks, controlling variance accumulation across depth; zero-initializes the last BN per block so residual branches start near identity

However, these methods each address only one aspect of initialization. There is room to design strategies that jointly account for residual connections, batch normalization's re-scaling effect, depth-dependent scaling, and the interaction between different layer types (convolution vs classifier).

What You Can Modify

The initialize_weights(model, config) function (lines 147-180) in custom_init.py. This function receives the fully constructed model and a config dict, and must initialize all parameters.

You can modify:

  • How nn.Conv2d weights are initialized (distribution, fan-in/fan-out, gain)
  • How nn.BatchNorm2d parameters (weight/bias) are initialized
  • How nn.Linear weights and biases are initialized
  • Per-layer or depth-dependent scaling strategies
  • Special handling for residual shortcut projections vs main-path convolutions
  • Any data-independent initialization logic (no training data access)

The config dict provides: arch (str), num_classes (int), depth (int = number of Conv2d + Linear layers). You can also iterate over model.named_modules() or model.named_parameters().

Evaluation

  • Metric: Best test accuracy (%, higher is better)
  • Architectures & datasets:
    • ResNet-56 on CIFAR-100 (deep residual, 100 classes)
    • VGG-16-BN on CIFAR-100 (deep non-residual with BatchNorm, 100 classes)
    • MobileNetV2 on FashionMNIST (lightweight inverted-residual, 10 classes) — hidden, evaluated on final submission only
  • Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
  • Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip

Code

custom_init.py
EditableRead-only
1"""CV Weight Initialization Benchmark.
2
3Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST to evaluate
4weight initialization strategies.
5
6FIXED: Model architectures, data pipeline, training loop.
7EDITABLE: initialize_weights() function.
8
9Usage:
10 python custom_init.py --arch resnet20 --dataset cifar10 --seed 42
11"""
12
13import argparse
14import math
15import os

Results

ModelTypetest acc resnet56-cifar100 test acc vgg16bn-cifar100 test acc mobilenetv2-fmnist
fixupbaseline72.37074.37094.480
kaiming_normalbaseline72.07073.38094.490
orthogonalbaseline72.08072.83093.880
anthropic/claude-opus-4.6vanilla72.70074.35094.850
deepseek-reasonervanilla72.61074.44094.540
google/gemini-3.1-pro-previewvanilla72.99074.21094.500
openai/gpt-5.4vanilla72.60074.46094.510
qwen/qwen3.6-plusvanilla72.92074.25094.610
anthropic/claude-opus-4.6agent72.70074.35094.850
deepseek-reasoneragent72.61074.44094.540
google/gemini-3.1-pro-previewagent72.99074.21094.500
openai/gpt-5.4agent72.55074.89094.520
qwen/qwen3.6-plusagent72.92074.25094.610

Agent Conversations