dl-residual-connection

Deep Learningpytorch-visionrigorous codebase

Description

DL Residual Connection Block Design

Research Question

Design a novel residual/skip connection block for deep convolutional neural networks that improves test accuracy across different depths and datasets.

Background

Residual connections (He et al., 2016) enabled training of very deep networks by providing identity shortcut paths. The basic residual block adds the input to the output of two stacked 3x3 convolutions. Several improvements have been proposed:

  • Pre-activation ResBlock (He et al., 2016 v2): BN-ReLU-Conv order instead of Conv-BN-ReLU, enabling cleaner gradient flow
  • Gated Residual (ReZero / learnable scaling): A learnable scalar gate scales the residual branch before addition, allowing the network to learn optimal residual contribution per block
  • Stochastic Depth (Huang et al., ECCV 2016): Randomly drops entire residual blocks during training with linearly decaying survival probability, acting as an implicit ensemble regularizer that is especially effective for very deep networks
  • ResNeXt (Xie et al., 2017): Grouped convolutions for multi-branch aggregation
  • Res2Net (Gao et al., 2019): Multi-scale feature extraction within a single residual block

There is room for novel block designs that better balance gradient flow, feature reuse, and computational efficiency, particularly for varying network depths.

What You Can Modify

The CustomBlock class (lines 31-61) in custom_residual.py. This is the residual block used by the ResNet backbone.

You can modify:

  • The internal convolution structure (number, kernel sizes, grouping)
  • The activation function placement and type
  • The normalization layer placement and type
  • The shortcut/skip connection design
  • Channel attention or spatial attention mechanisms
  • The expansion class attribute (1 for basic, 4 for bottleneck, etc.)
  • Any additional modules within the block

Constraints:

  • The block must accept (in_planes, planes, stride) constructor arguments
  • The block must have an expansion class attribute
  • forward(x) must return a tensor with channels = planes * expansion
  • The shortcut must handle dimension mismatches (stride != 1 or channel mismatch)

Evaluation

  • Metric: Best test accuracy (%, higher is better)
  • Architectures & datasets:
    • ResNet-20 ([3,3,3]) on CIFAR-10 (shallow, 10 classes)
    • ResNet-56 ([9,9,9]) on CIFAR-100 (deep, 100 classes)
    • ResNet-110 ([18,18,18]) on CIFAR-100 (very deep, 100 classes — tests gradient flow) — hidden, evaluated on final submission only
  • Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
  • Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip

Code

custom_residual.py
EditableRead-only
1"""CV Residual Connection Benchmark.
2
3Train CIFAR ResNets with custom residual blocks to evaluate
4skip/residual connection designs.
5
6FIXED: ResNet backbone, data pipeline, training loop.
7EDITABLE: CustomBlock class (residual block design).
8
9Usage:
10 python custom_residual.py --arch resnet20 --dataset cifar10 --seed 42
11"""
12
13import argparse
14import math
15import os

Results

ModelTypetest acc resnet20-cifar10 test acc resnet56-cifar100 test acc resnet110-cifar100
gated_residualbaseline92.96071.98073.460
pre_activationbaseline92.62071.78074.080
stochastic_depthbaseline91.37074.56075.700
stochastic_depthbaseline91.35073.65075.570
anthropic/claude-opus-4.6vanilla92.60072.59073.870
deepseek-reasonervanilla91.83073.27074.710
google/gemini-3.1-pro-previewvanilla92.05071.87074.630
openai/gpt-5.4vanilla92.54072.10073.880
qwen/qwen3.6-plusvanilla92.58072.77074.710
anthropic/claude-opus-4.6agent91.62073.70075.250
deepseek-reasoneragent91.83073.27074.710
google/gemini-3.1-pro-previewagent92.98074.64076.060
openai/gpt-5.4agent92.62072.99075.680
qwen/qwen3.6-plusagent92.58072.77074.710

Agent Conversations