dl-normalization
Description
DL Normalization Layer Design
Research Question
Design a novel normalization layer for deep convolutional neural networks that improves training stability and final test accuracy across different architectures and datasets.
Background
Normalization layers are critical components in modern deep networks, controlling internal covariate shift and enabling stable training at higher learning rates. Classic methods include:
- BatchNorm (Ioffe & Szegedy, 2015): Normalizes across the batch dimension per channel. The de facto standard, but depends on batch statistics and behaves differently at train/test time.
- GroupNorm (Wu & He, 2018): Divides channels into groups and normalizes within each group. Batch-size independent.
- InstanceNorm (Ulyanov et al., 2016): Normalizes each channel independently per instance. Common in style transfer.
- LayerNorm (Ba et al., 2016): Normalizes across all channels for each sample. Standard in transformers but less common in CNNs.
However, each method has limitations: BatchNorm degrades with small batches, GroupNorm requires choosing the number of groups, InstanceNorm discards inter-channel information, and LayerNorm may not suit spatial feature maps well. There is room to design normalization strategies that combine strengths of multiple approaches or introduce novel normalization statistics.
What You Can Modify
The CustomNorm class (lines 31-45) in custom_norm.py. This class must be a drop-in replacement for nn.BatchNorm2d:
- Constructor takes
num_features(number of channels C) - Input shape:
[B, C, H, W] - Output shape:
[B, C, H, W]
You can modify:
- The normalization statistics (mean/variance computation: over batch, channel, spatial, or combinations)
- Learnable affine parameters (scale and shift)
- The normalization grouping strategy
- Combining multiple normalization approaches
- Adaptive or input-dependent normalization
- Any other normalization design that maintains the interface
Evaluation
- Metric: Best test accuracy (%, higher is better)
- Architectures & datasets:
- ResNet-56 on CIFAR-100 (deep residual, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual, 10 classes)
- ResNet-110 on CIFAR-100 (very deep residual, 100 classes) — hidden, evaluated on final submission only
- Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
- Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip
Code
1"""CV Normalization Layer Benchmark.23Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST to evaluate4normalization layer designs.56FIXED: Model architectures, data pipeline, training loop.7EDITABLE: CustomNorm class.89Usage:10python custom_norm.py --arch resnet20 --dataset cifar10 --seed 4211"""1213import argparse14import math15import os
Results
| Model | Type | test acc resnet56-cifar100 ↑ | test acc resnet110-cifar100 ↑ | test acc mobilenetv2-fmnist ↑ |
|---|---|---|---|---|
| batch_instance_norm | baseline | 66.060 | 68.650 | 93.640 |
| group_norm | baseline | 67.900 | 70.430 | 93.160 |
| switchable_norm | baseline | 68.950 | 70.580 | 94.100 |
| anthropic/claude-opus-4.6 | vanilla | 72.650 | 73.930 | 94.710 |
| deepseek-reasoner | vanilla | 67.140 | 50.960 | 90.540 |
| google/gemini-3.1-pro-preview | vanilla | 72.390 | 74.860 | 94.590 |
| openai/gpt-5.4 | vanilla | 66.660 | - | 94.080 |
| qwen/qwen3.6-plus | vanilla | - | - | - |
| anthropic/claude-opus-4.6 | agent | 72.330 | 74.500 | 94.220 |
| deepseek-reasoner | agent | 63.020 | 54.200 | 90.890 |
| google/gemini-3.1-pro-preview | agent | 72.390 | 74.860 | 94.590 |
| openai/gpt-5.4 | agent | 72.300 | - | 94.660 |
| qwen/qwen3.6-plus | agent | 1.000 | 1.000 | 10.000 |