dl-normalization

Deep Learningpytorch-visionrigorous codebase

Description

DL Normalization Layer Design

Research Question

Design a novel normalization layer for deep convolutional neural networks that improves training stability and final test accuracy across different architectures and datasets.

Background

Normalization layers are critical components in modern deep networks, controlling internal covariate shift and enabling stable training at higher learning rates. Classic methods include:

BatchNorm (Ioffe & Szegedy, 2015): Normalizes across the batch dimension per channel. The de facto standard, but depends on batch statistics and behaves differently at train/test time.
GroupNorm (Wu & He, 2018): Divides channels into groups and normalizes within each group. Batch-size independent.
InstanceNorm (Ulyanov et al., 2016): Normalizes each channel independently per instance. Common in style transfer.
LayerNorm (Ba et al., 2016): Normalizes across all channels for each sample. Standard in transformers but less common in CNNs.

However, each method has limitations: BatchNorm degrades with small batches, GroupNorm requires choosing the number of groups, InstanceNorm discards inter-channel information, and LayerNorm may not suit spatial feature maps well. There is room to design normalization strategies that combine strengths of multiple approaches or introduce novel normalization statistics.

What You Can Modify

The CustomNorm class (lines 31-45) in custom_norm.py. This class must be a drop-in replacement for nn.BatchNorm2d:

Constructor takes num_features (number of channels C)
Input shape: [B, C, H, W]
Output shape: [B, C, H, W]

You can modify:

The normalization statistics (mean/variance computation: over batch, channel, spatial, or combinations)
Learnable affine parameters (scale and shift)
The normalization grouping strategy
Combining multiple normalization approaches
Adaptive or input-dependent normalization
Any other normalization design that maintains the interface

Evaluation

Metric: Best test accuracy (%, higher is better)
Architectures & datasets:
- ResNet-56 on CIFAR-100 (deep residual, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual, 10 classes)
- ResNet-110 on CIFAR-100 (very deep residual, 100 classes) — hidden, evaluated on final submission only
Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip

Code

custom_norm.py

EditableRead-only

1"""CV Normalization Layer Benchmark.
2
3Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST to evaluate
4normalization layer designs.
5
6FIXED: Model architectures, data pipeline, training loop.
7EDITABLE: CustomNorm class.
8
9Usage:
10    python custom_norm.py --arch resnet20 --dataset cifar10 --seed 42
11"""
12
13import argparse
14import math
15import os

Results

Model	Type	test acc resnet56-cifar100 ↑	test acc resnet110-cifar100 ↑	test acc mobilenetv2-fmnist ↑
batch_instance_norm	baseline	66.060	68.650	93.640
group_norm	baseline	67.900	70.430	93.160
switchable_norm	baseline	68.950	70.580	94.100
anthropic/claude-opus-4.6	vanilla	72.650	73.930	94.710
deepseek-reasoner	vanilla	67.140	50.960	90.540
google/gemini-3.1-pro-preview	vanilla	72.390	74.860	94.590
openai/gpt-5.4	vanilla	66.660	-	94.080
qwen/qwen3.6-plus	vanilla	-	-	-
anthropic/claude-opus-4.6	agent	72.330	74.500	94.220
deepseek-reasoner	agent	63.020	54.200	90.890
google/gemini-3.1-pro-preview	agent	72.390	74.860	94.590
openai/gpt-5.4	agent	72.300	-	94.660
qwen/qwen3.6-plus	agent	1.000	1.000	10.000

Agent Conversations

deepseek-reasoner

11 steps