dl-activation-function

Deep Learningpytorch-visionrigorous codebase

Description

DL Activation Function Design

Research Question

Design a novel activation function for deep convolutional neural networks that improves test accuracy across different architectures (ResNet, VGG) and datasets (CIFAR-10, CIFAR-100).

Background

Activation functions introduce nonlinearity into neural networks and critically affect training dynamics and generalization. Classic choices include:

  • ReLU (2010): max(0, x) — simple, sparse, but zero gradient for negative inputs ("dying ReLU")
  • GELU (2016): x * Phi(x) — smooth approximation weighting by Gaussian CDF
  • Swish/SiLU (2017): x * sigmoid(x) — self-gated, smooth, non-monotonic
  • Mish (2019): x * tanh(softplus(x)) — self-regularized, smooth

These functions differ in smoothness, gating behavior, and negative-domain behavior, and may interact differently with modern network components such as residual connections and batch normalization.

What You Can Modify

The CustomActivation class (lines 31-48) in custom_activation.py. This is an nn.Module used as a drop-in replacement for ReLU throughout the network.

You can modify:

  • The forward computation (any element-wise or channel-wise operation)
  • Learnable parameters (registered in __init__)
  • The shape of the activation curve (monotonic, non-monotonic, bounded, etc.)
  • Negative-domain behavior (zero, linear, bounded, learnable)
  • Any stateless or stateful activation logic

The activation is used in:

  • ResNet: BasicBlock (2x per block) + initial conv
  • VGG: after every Conv-BN pair + in the classifier head

Evaluation

  • Metric: Best test accuracy (%, higher is better)
  • Architectures & datasets:
    • ResNet-20 on CIFAR-10 (shallow residual, 10 classes)
    • VGG-16-BN on CIFAR-100 (deep non-residual with BatchNorm, 100 classes)
    • MobileNetV2 on FashionMNIST (lightweight inverted-residual with ReLU6 baseline, 10 classes) — hidden, evaluated on final submission only
  • Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
  • Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip

Code

custom_activation.py
EditableRead-only
1"""CV Activation Function Benchmark.
2
3Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST
4to evaluate custom activation functions.
5
6FIXED: Model architectures, data pipeline, training loop.
7EDITABLE: CustomActivation class.
8
9Usage:
10 python custom_activation.py --arch resnet20 --dataset cifar10 --seed 42
11 python custom_activation.py --arch mobilenetv2 --dataset fmnist --seed 42
12"""
13
14import argparse
15import math

Results

ModelTypetest acc resnet20-cifar10 test acc vgg16bn-cifar100 test acc mobilenetv2-fmnist
gelubaseline93.11071.38094.750
mishbaseline92.78070.50094.700
silubaseline92.72070.38094.690
anthropic/claude-opus-4.6vanilla92.84070.88094.820
deepseek-reasonervanilla93.01073.87094.740
google/gemini-3.1-pro-previewvanilla92.3701.00094.540
openai/gpt-5.4vanilla91.14068.05093.530
qwen/qwen3.6-plusvanilla93.10072.70094.900
anthropic/claude-opus-4.6agent92.84070.88094.820
deepseek-reasoneragent93.01073.87094.740
google/gemini-3.1-pro-previewagent92.68073.58094.910
openai/gpt-5.4agent92.50070.12094.350
qwen/qwen3.6-plusagent93.10072.70094.900

Agent Conversations