dl-activation-function
Deep Learningpytorch-visionrigorous codebase
Description
DL Activation Function Design
Research Question
Design a novel activation function for deep convolutional neural networks that improves test accuracy across different architectures (ResNet, VGG) and datasets (CIFAR-10, CIFAR-100).
Background
Activation functions introduce nonlinearity into neural networks and critically affect training dynamics and generalization. Classic choices include:
- ReLU (2010): max(0, x) — simple, sparse, but zero gradient for negative inputs ("dying ReLU")
- GELU (2016): x * Phi(x) — smooth approximation weighting by Gaussian CDF
- Swish/SiLU (2017): x * sigmoid(x) — self-gated, smooth, non-monotonic
- Mish (2019): x * tanh(softplus(x)) — self-regularized, smooth
These functions differ in smoothness, gating behavior, and negative-domain behavior, and may interact differently with modern network components such as residual connections and batch normalization.
What You Can Modify
The CustomActivation class (lines 31-48) in custom_activation.py. This is an nn.Module used as a drop-in replacement for ReLU throughout the network.
You can modify:
- The forward computation (any element-wise or channel-wise operation)
- Learnable parameters (registered in
__init__) - The shape of the activation curve (monotonic, non-monotonic, bounded, etc.)
- Negative-domain behavior (zero, linear, bounded, learnable)
- Any stateless or stateful activation logic
The activation is used in:
- ResNet: BasicBlock (2x per block) + initial conv
- VGG: after every Conv-BN pair + in the classifier head
Evaluation
- Metric: Best test accuracy (%, higher is better)
- Architectures & datasets:
- ResNet-20 on CIFAR-10 (shallow residual, 10 classes)
- VGG-16-BN on CIFAR-100 (deep non-residual with BatchNorm, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual with ReLU6 baseline, 10 classes) — hidden, evaluated on final submission only
- Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
- Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip
Code
custom_activation.py
EditableRead-only
1"""CV Activation Function Benchmark.23Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST4to evaluate custom activation functions.56FIXED: Model architectures, data pipeline, training loop.7EDITABLE: CustomActivation class.89Usage:10python custom_activation.py --arch resnet20 --dataset cifar10 --seed 4211python custom_activation.py --arch mobilenetv2 --dataset fmnist --seed 4212"""1314import argparse15import math
Results
| Model | Type | test acc resnet20-cifar10 ↑ | test acc vgg16bn-cifar100 ↑ | test acc mobilenetv2-fmnist ↑ |
|---|---|---|---|---|
| gelu | baseline | 93.110 | 71.380 | 94.750 |
| mish | baseline | 92.780 | 70.500 | 94.700 |
| silu | baseline | 92.720 | 70.380 | 94.690 |
| anthropic/claude-opus-4.6 | vanilla | 92.840 | 70.880 | 94.820 |
| deepseek-reasoner | vanilla | 93.010 | 73.870 | 94.740 |
| google/gemini-3.1-pro-preview | vanilla | 92.370 | 1.000 | 94.540 |
| openai/gpt-5.4 | vanilla | 91.140 | 68.050 | 93.530 |
| qwen/qwen3.6-plus | vanilla | 93.100 | 72.700 | 94.900 |
| anthropic/claude-opus-4.6 | agent | 92.840 | 70.880 | 94.820 |
| deepseek-reasoner | agent | 93.010 | 73.870 | 94.740 |
| google/gemini-3.1-pro-preview | agent | 92.680 | 73.580 | 94.910 |
| openai/gpt-5.4 | agent | 92.500 | 70.120 | 94.350 |
| qwen/qwen3.6-plus | agent | 93.100 | 72.700 | 94.900 |