dl-activation-function

Deep Learningpytorch-visionrigorous codebase

Description

DL Activation Function Design

Research Question

Design a novel activation function for deep convolutional neural networks that improves test accuracy across different architectures (ResNet, VGG) and datasets (CIFAR-10, CIFAR-100).

Background

Activation functions introduce nonlinearity into neural networks and critically affect training dynamics and generalization. Classic choices include:

ReLU (2010): max(0, x) — simple, sparse, but zero gradient for negative inputs ("dying ReLU")
GELU (2016): x * Phi(x) — smooth approximation weighting by Gaussian CDF
Swish/SiLU (2017): x * sigmoid(x) — self-gated, smooth, non-monotonic
Mish (2019): x * tanh(softplus(x)) — self-regularized, smooth

These functions differ in smoothness, gating behavior, and negative-domain behavior, and may interact differently with modern network components such as residual connections and batch normalization.

What You Can Modify

The CustomActivation class (lines 31-48) in custom_activation.py. This is an nn.Module used as a drop-in replacement for ReLU throughout the network.

You can modify:

The forward computation (any element-wise or channel-wise operation)
Learnable parameters (registered in __init__)
The shape of the activation curve (monotonic, non-monotonic, bounded, etc.)
Negative-domain behavior (zero, linear, bounded, learnable)
Any stateless or stateful activation logic

The activation is used in:

ResNet: BasicBlock (2x per block) + initial conv
VGG: after every Conv-BN pair + in the classifier head

Evaluation

Metric: Best test accuracy (%, higher is better)
Architectures & datasets:
- ResNet-20 on CIFAR-10 (shallow residual, 10 classes)
- VGG-16-BN on CIFAR-100 (deep non-residual with BatchNorm, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual with ReLU6 baseline, 10 classes) — hidden, evaluated on final submission only
Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip

Code

custom_activation.py

EditableRead-only

1"""CV Activation Function Benchmark.
2
3Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST
4to evaluate custom activation functions.
5
6FIXED: Model architectures, data pipeline, training loop.
7EDITABLE: CustomActivation class.
8
9Usage:
10    python custom_activation.py --arch resnet20 --dataset cifar10 --seed 42
11    python custom_activation.py --arch mobilenetv2 --dataset fmnist --seed 42
12"""
13
14import argparse
15import math

Results

Model	Type	test acc resnet20-cifar10 ↑	test acc vgg16bn-cifar100 ↑	test acc mobilenetv2-fmnist ↑
gelu	baseline	93.110	71.380	94.750
mish	baseline	92.780	70.500	94.700
silu	baseline	92.720	70.380	94.690
anthropic/claude-opus-4.6	vanilla	92.840	70.880	94.820
deepseek-reasoner	vanilla	93.010	73.870	94.740
google/gemini-3.1-pro-preview	vanilla	92.370	1.000	94.540
openai/gpt-5.4	vanilla	91.140	68.050	93.530
qwen/qwen3.6-plus	vanilla	93.100	72.700	94.900
anthropic/claude-opus-4.6	agent	92.840	70.880	94.820
deepseek-reasoner	agent	93.010	73.870	94.740
google/gemini-3.1-pro-preview	agent	92.680	73.580	94.910
openai/gpt-5.4	agent	92.500	70.120	94.350
qwen/qwen3.6-plus	agent	93.100	72.700	94.900

Agent Conversations

deepseek-reasoner

9 steps