Agent Conversation: deepseek-reasoner
Model: deepseek-reasoner
Total steps: 9
Messages: 20
UserStep 0
Task: dl-activation-function
DL Activation Function Design
Research Question
Design a novel activation function for deep convolutional neural networks that improves test accuracy across different architectures (ResNet, VGG) and datasets (CIFAR-10, CIFAR-100).
Background
Activation functions introduce nonlinearity into neural networks and critically affect training dynamics and generalization. Classic choices include:
- ReLU (2010): max(0, x) — simple, sparse, but zero gradient for negative inputs ("dying ReLU")
- GELU (2016): x * Phi(x) — smooth approximation weighting by Gaussian CDF
- Swish/SiLU (2017): x * sigmoid(x) — self-gated, smooth, non-monotonic
- Mish (2019): x * tanh(softplus(x)) — self-regularized, smooth
These functions differ in smoothness, gating behavior, and negative-domain behavior, and may interact differently with modern network components such as residual connections and batch normalization.
What You Can Modify
The CustomActivation class (lines 31-48) in custom_activation.py. This is an nn.Module used as a drop-in replacement for ReLU throughout the network.
You can modify:
- The forward computation (any element-wise or channel-wise operation)
- Learnable parameters (registered in
__init__) - The shape of the activation curve (monotonic, non-monotonic, bounded, etc.)
- Negative-domain behavior (zero, linear, bounded, learnable)
- Any stateless or stateful activation logic
The activation is used in:
- ResNet: BasicBlock (2x per block) + initial conv
- VGG: after every Conv-BN pair + in the classifier head
Evaluation
- Metric: Best test accuracy (%, higher is better)
- Architectures & datasets:
- ResNet-20 on CIFAR-10 (shallow residual, 10 classes)
- VGG-16-BN on CIFAR-100 (deep non-residual with BatchNorm, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual with ReLU6 baseline, 10 classes) — hidden, evaluated on final submission only
- Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
- Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip
pytorch-vision/custom_activation.py [EDITABLE — lines 32–49 only]
1: """CV Activation Function Benchmark.
2:
3: Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST
4: to evaluate custom activation functions.
5:
6: FIXED: Model architectures, data pipeline, training loop.
7: EDITABLE: CustomActivation class.
8:
9: Usage:
10: python custom_activation.py --arch resnet20 --dataset cifar10 --seed 42
11: python custom_activation.py --arch mobilenetv2 --dataset fmnist --seed 42
12: """
13:
14: import argparse
15: import math
16: import os
17: import time
18:
19: import torch
20: import torch.nn as nn
21: import torch.nn.functional as F
22: import torch.optim as optim
23: import torchvision
24: import torchvision.tAssistantStep 1edit
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_activation.py",
4 "start_line": 32,
5 "end_line": 49,
6 "content": "class CustomActivation(nn.Module):\n \"\"\"Parametric Swish activation with learnable temperature and shift.\n\n PSwish(x) = x * sigmoid(α * x + β)\n where α > 0 (via softplus) and β are learnable parameters per activation instance.\n This generalizes Swish (α=1, β=0) and allows adaptive gating behavior.\n \"\"\"\n\n def __init__(self):\n super().__init__()\n # Learnable parameters\n self.theta = nn.Parameter(torch.tensor(0.0)) # unconstrained for α\n self.beta = nn.Parameter(torch.tensor(0.0)) # shift\n\n def forward(self, x):\n alpha = F.softplus(self.theta) # ensure positivity\n return x * torch.sigmoid(alpha * x + self.beta)"
7}Tool ResultStep 1
OK: Replaced lines 32..49 with 17 line(s) in pytorch-vision/custom_activation.py. Editable range: 32–48.
[Current file: pytorch-vision/custom_activation.py | editable: 32–48 | total: 428 lines]
29| # ============================================================================
30|
31| # -- EDITABLE REGION START (lines 32-49) --------------------------------------
32 class CustomActivation(nn.Module):
33 """Parametric Swish activation with learnable temperature and shift.
34
35 PSwish(x) = x * sigmoid(α * x + β)
36 where α > 0 (via softplus) and β are learnable parameters per activation instance.
37 This generalizes Swish (α=1, β=0) and allows adaptive gating behavior.
38 """
39
40 def __init__(self):
41 super().__init__()
42 # Learnable parameters
43 self.theta = nn.Parameter(torch.tensor(0.0)) # unconstrained for α
44 self.beta = nn.Parameter(torch.t
AssistantStep 2test
Tool ResultStep 2
[Test #1] (2 tests remaining; call submit(n=N) to choose which test result to submit as final) ### resnet20-cifar10 (scripts/resnet20_cifar10.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.0804 train_acc=97.31 test_loss=0.3131 test_acc=90.52 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=0.0446 train_acc=98.64 test_loss=0.3103 test_acc=91.32 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=0.0213 train_acc=99.53 test_loss=0.2910 test_acc=92.43 lr=0.002447 TRAIN_METRICS: epoch=190 train_loss=0.0128 train_acc=99.78 test_loss=0.2793 test_acc=92.79 lr=0.000616 TRAIN_METRICS: epoch=200 train_loss=0.0115 train_acc=99.84 test_loss=0.2779 test_acc=92.87 lr=0.000000 Final metrics (resnet20-cifar10): test_acc_resnet20-cifar10=93.01 ### vgg16bn-cifar100 (scripts/vgg16bn_cifar100.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.1574 train_acc=95.29 test_loss=1.5851 test_acc=69.12 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=0.0551 train_ac
AssistantStep 3undo
Tool Input
1{
2 "n": 1
3}Tool ResultStep 3
Undo complete: Restored: pytorch-vision/custom_activation.py
AssistantStep 4edit
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_activation.py",
4 "start_line": 32,
5 "end_line": 49,
6 "content": "class CustomActivation(nn.Module):\n \"\"\"Learnable Exponential Linear Unit (LELU).\n\n LELU(x) = x if x >= 0 else α * (exp(x) - 1)\n where α > 0 is a learnable parameter per activation instance.\n This generalizes ELU (α=1) and allows adaptive negative saturation level.\n \"\"\"\n\n def __init__(self):\n super().__init__()\n # Learnable parameter α, initialized to 1.0 (standard ELU)\n self.log_alpha = nn.Parameter(torch.tensor(0.0)) # log(α)\n\n def forward(self, x):\n alpha = torch.exp(self.log_alpha) # ensures positivity\n return torch.where(x >= 0, x, alpha * (torch.exp(x) - 1))"
7}Tool ResultStep 4
OK: Replaced lines 32..49 with 16 line(s) in pytorch-vision/custom_activation.py. Editable range: 32–47.
[Current file: pytorch-vision/custom_activation.py | editable: 32–47 | total: 427 lines]
29| # ============================================================================
30|
31| # -- EDITABLE REGION START (lines 32-49) --------------------------------------
32 class CustomActivation(nn.Module):
33 """Learnable Exponential Linear Unit (LELU).
34
35 LELU(x) = x if x >= 0 else α * (exp(x) - 1)
36 where α > 0 is a learnable parameter per activation instance.
37 This generalizes ELU (α=1) and allows adaptive negative saturation level.
38 """
39
40 def __init__(self):
41 super().__init__()
42 # Learnable parameter α, initialized to 1.0 (standard ELU)
43 self.log_alpha = nn.Parameter(torch.tensor(0.0)) # log(α)
44
45 def forward(self, x):
46
AssistantStep 5test
Tool ResultStep 5
[Test #2] (1 test remaining; call submit(n=N) to choose which test result to submit as final) ### resnet20-cifar10 (scripts/resnet20_cifar10.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.1292 train_acc=95.64 test_loss=0.3142 test_acc=90.33 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=0.0847 train_acc=97.26 test_loss=0.3085 test_acc=90.50 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=0.0482 train_acc=98.65 test_loss=0.2777 test_acc=91.79 lr=0.002447 TRAIN_METRICS: epoch=190 train_loss=0.0308 train_acc=99.27 test_loss=0.2762 test_acc=91.91 lr=0.000616 TRAIN_METRICS: epoch=200 train_loss=0.0268 train_acc=99.46 test_loss=0.2732 test_acc=92.10 lr=0.000000 Final metrics (resnet20-cifar10): test_acc_resnet20-cifar10=92.19 ### vgg16bn-cifar100 (scripts/vgg16bn_cifar100.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.1674 train_acc=94.82 test_loss=1.5800 test_acc=69.13 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=0.0714 train_acc
AssistantStep 6undo
Tool Input
1{
2 "n": 1
3}Tool ResultStep 6
Undo complete: Restored: pytorch-vision/custom_activation.py
AssistantStep 7edit
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_activation.py",
4 "start_line": 32,
5 "end_line": 49,
6 "content": "class CustomActivation(nn.Module):\n \"\"\"Smooth Leaky ReLU (SLeakyReLU) with learnable mixture.\n\n SLeakyReLU(x) = x * [β + (1-β) * sigmoid(α*x)]\n where α > 0 (via softplus) and β ∈ [0,1] (via sigmoid) are learnable.\n This smoothly interpolates between linear (β=1) and Swish (β=0).\n The parameter α controls the sharpness of transition.\n \"\"\"\n\n def __init__(self):\n super().__init__()\n # Learnable parameters\n self.theta = nn.Parameter(torch.tensor(0.0)) # unconstrained for α\n self.phi = nn.Parameter(torch.tensor(-float('inf'))) # initialized to β=0 (Swish)\n\n def forward(self, x):\n alpha = F.softplus(self.theta)\n beta = torch.sigmoid(self.phi) # β ∈ [0,1]\n return x * (beta + (1 - beta) * torch.sigmoid(alpha * x))"
7}Tool ResultStep 7
OK: Replaced lines 32..49 with 19 line(s) in pytorch-vision/custom_activation.py. Editable range: 32–50.
[Current file: pytorch-vision/custom_activation.py | editable: 32–50 | total: 430 lines]
29| # ============================================================================
30|
31| # -- EDITABLE REGION START (lines 32-49) --------------------------------------
32 class CustomActivation(nn.Module):
33 """Smooth Leaky ReLU (SLeakyReLU) with learnable mixture.
34
35 SLeakyReLU(x) = x * [β + (1-β) * sigmoid(α*x)]
36 where α > 0 (via softplus) and β ∈ [0,1] (via sigmoid) are learnable.
37 This smoothly interpolates between linear (β=1) and Swish (β=0).
38 The parameter α controls the sharpness of transition.
39 """
40
41 def __init__(self):
42 super().__init__()
43 # Learnable parameters
44 self.theta = nn.Parameter(torch.tensor(0.0)) # unconstrained for α
AssistantStep 8test
Tool ResultStep 8
[Test #3] (0 tests remaining; call submit(n=N) to choose which test result to submit as final) [NOTE] This was your last test. You MUST now call submit(n=X) to choose which test result to submit as your final answer. ### resnet20-cifar10 (scripts/resnet20_cifar10.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=nan train_acc=10.00 test_loss=nan test_acc=10.00 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=nan train_acc=10.00 test_loss=nan test_acc=10.00 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=nan train_acc=10.00 test_loss=nan test_acc=10.00 lr=0.002447 TRAIN_METRICS: epoch=190 train_loss=nan train_acc=10.00 test_loss=nan test_acc=10.00 lr=0.000616 TRAIN_METRICS: epoch=200 train_loss=nan train_acc=10.00 test_loss=nan test_acc=10.00 lr=0.000000 Final metrics (resnet20-cifar10): test_acc_resnet20-cifar10=10.00 ### vgg16bn-cifar100 (scripts/vgg16bn_cifar100.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=nan train_acc=1.00 test_l
AssistantStep 9submit
Tool Input
1{
2 "n": 1
3}Tool ResultStep 9
[submit] Submitting result from test #1 as final. ### resnet20-cifar10 (scripts/resnet20_cifar10.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.0804 train_acc=97.31 test_loss=0.3131 test_acc=90.52 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=0.0446 train_acc=98.64 test_loss=0.3103 test_acc=91.32 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=0.0213 train_acc=99.53 test_loss=0.2910 test_acc=92.43 lr=0.002447 TRAIN_METRICS: epoch=190 train_loss=0.0128 train_acc=99.78 test_loss=0.2793 test_acc=92.79 lr=0.000616 TRAIN_METRICS: epoch=200 train_loss=0.0115 train_acc=99.84 test_loss=0.2779 test_acc=92.87 lr=0.000000 Final metrics (resnet20-cifar10): test_acc_resnet20-cifar10=93.01 ### vgg16bn-cifar100 (scripts/vgg16bn_cifar100.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.1574 train_acc=95.29 test_loss=1.5851 test_acc=69.12 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=0.0551 train_acc=98.41 test_loss=1.5952 test_acc=71.38 lr=0.