Agent Conversation: deepseek-reasoner
Task: dl-normalization
DL Normalization Layer Design
Research Question
Design a novel normalization layer for deep convolutional neural networks that improves training stability and final test accuracy across different architectures and datasets.
Background
Normalization layers are critical components in modern deep networks, controlling internal covariate shift and enabling stable training at higher learning rates. Classic methods include:
- BatchNorm (Ioffe & Szegedy, 2015): Normalizes across the batch dimension per channel. The de facto standard, but depends on batch statistics and behaves differently at train/test time.
- GroupNorm (Wu & He, 2018): Divides channels into groups and normalizes within each group. Batch-size independent.
- InstanceNorm (Ulyanov et al., 2016): Normalizes each channel independently per instance. Common in style transfer.
- LayerNorm (Ba et al., 2016): Normalizes across all channels for each sample. Standard in transformers but less common in CNNs.
However, each method has limitations: BatchNorm degrades with small batches, GroupNorm requires choosing the number of groups, InstanceNorm discards inter-channel information, and LayerNorm may not suit spatial feature maps well. There is room to design normalization strategies that combine strengths of multiple approaches or introduce novel normalization statistics.
What You Can Modify
The CustomNorm class (lines 31-45) in custom_norm.py. This class must be a drop-in replacement for nn.BatchNorm2d:
- Constructor takes
num_features(number of channels C) - Input shape:
[B, C, H, W] - Output shape:
[B, C, H, W]
You can modify:
- The normalization statistics (mean/variance computation: over batch, channel, spatial, or combinations)
- Learnable affine parameters (scale and shift)
- The normalization grouping strategy
- Combining multiple normalization approaches
- Adaptive or input-dependent normalization
- Any other normalization design that maintains the interface
Evaluation
- Metric: Best test accuracy (%, higher is better)
- Architectures & datasets:
- ResNet-56 on CIFAR-100 (deep residual, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual, 10 classes)
- ResNet-110 on CIFAR-100 (very deep residual, 100 classes) — hidden, evaluated on final submission only
- Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
- Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip
pytorch-vision/custom_norm.py [EDITABLE — lines 31–45 only]
1: """CV Normalization Layer Benchmark.
2:
3: Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST to evaluate
4: normalization layer designs.
5:
6: FIXED: Model architectures, data pipeline, training loop.
7: EDITABLE: CustomNorm class.
8:
9: Usage:
10: python custom_norm.py --arch resnet20 --dataset cifar10 --seed 42
11:Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_norm.py",
4 "start_line": 31,
5 "end_line": 45,
6 "content": "class CustomNorm(nn.Module):\n \"\"\"Custom normalization layer. Drop-in replacement for BatchNorm2d.\n\n Args:\n num_features: number of channels C\n Input: [B, C, H, W]\n Output: [B, C, H, W]\n \"\"\"\n\n def __init__(self, num_features):\n super().__init__()\n self.num_features = num_features\n # automatic group selection\n num_groups = min(32, num_features)\n while num_features % num_groups != 0:\n num_groups -= 1\n self.norm = nn.GroupNorm(num_groups, num_features)\n # per-channel gate parameter (logit)\n self.gate_logit = nn.Parameter(torch.full((num_features,), 2.0))\n # learnable affine parameters\n self.weight = nn.Parameter(torch.ones(num_features))\n self.bias = nn.Parameter(torch.zeros(num_features))\n\n def forward(self, x):\n # compute group normalization\n normalized = self.norm(x)\n # compute gate per channel\n gate = torch.sigmoid(self.gate_logit.view(1, -1, 1, 1))\n # residual mixing\n out = gate * normalized + (1 - gate) * x\n # affine transformation\n out = out * self.weight.view(1, -1, 1, 1) + self.bias.view(1, -1, 1, 1)\n return out"
7}OK: Replaced lines 31..45 with 33 line(s) in pytorch-vision/custom_norm.py. Editable range: 31–63.
[Current file: pytorch-vision/custom_norm.py | editable: 31–63 | total: 449 lines]
28| # ============================================================================
29|
30| # -- EDITABLE REGION START (lines 31-45) --------------------------------------
31 class CustomNorm(nn.Module):
32 """Custom normalization layer. Drop-in replacement for BatchNorm2d.
33
34 Args:
35 num_features: number of channels C
36 Input: [B, C, H, W]
37 Output: [B, C, H, W]
38 """
39
40 def __init__(self, num_features):
41 super().__init__()
42 self.num_features = num_features
43 # automatic group selection
44 num_groups = min(32, num_features)
45 while num_features % num_groups != 0:
46 num_groups -= 1
47 self.norm = nn.Gr
Tool Input
1{
2 "n": 1
3}Undo complete: Restored: pytorch-vision/custom_norm.py
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_norm.py",
4 "start_line": 31,
5 "end_line": 45,
6 "content": "class CustomNorm(nn.Module):\n \"\"\"Gated Residual Normalization (GRN): GroupNorm + residual skip.\"\"\"\n def __init__(self, num_features):\n super().__init__()\n g = min(32, num_features)\n while num_features % g: g -= 1\n self.norm = nn.GroupNorm(g, num_features)\n self.gate = nn.Parameter(torch.full((num_features,), 2.0))\n self.weight = nn.Parameter(torch.ones(num_features))\n self.bias = nn.Parameter(torch.zeros(num_features))\n def forward(self, x):\n n = self.norm(x)\n g = torch.sigmoid(self.gate.view(1,-1,1,1))\n return (g*n + (1-g)*x) * self.weight.view(1,-1,1,1) + self.bias.view(1,-1,1,1)"
7}OK: Replaced lines 31..45 with 14 line(s) in pytorch-vision/custom_norm.py. Editable range: 31–44.
[Current file: pytorch-vision/custom_norm.py | editable: 31–44 | total: 430 lines]
28| # ============================================================================
29|
30| # -- EDITABLE REGION START (lines 31-45) --------------------------------------
31 class CustomNorm(nn.Module):
32 """Gated Residual Normalization (GRN): GroupNorm + residual skip."""
33 def __init__(self, num_features):
34 super().__init__()
35 g = min(32, num_features)
36 while num_features % g: g -= 1
37 self.norm = nn.GroupNorm(g, num_features)
38 self.gate = nn.Parameter(torch.full((num_features,), 2.0))
39 self.weight = nn.Parameter(torch.ones(num_features))
40 self.bias = nn.Parameter(torch.zeros(num_features))
41 def forward(self, x):
42 n = self.norm(x)
43
[Test #1] (2 tests remaining; call submit(n=N) to choose which test result to submit as final) ### resnet56-cifar100 (scripts/resnet56_cifar100.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.8453 train_acc=74.01 test_loss=1.3608 test_acc=62.90 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=0.6769 train_acc=79.15 test_loss=1.3260 test_acc=64.39 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=0.5168 train_acc=83.73 test_loss=1.3500 test_acc=65.83 lr=0.002447 TRAIN_METRICS: epoch=190 train_loss=0.3870 train_acc=87.92 test_loss=1.3957 test_acc=66.68 lr=0.000616 TRAIN_METRICS: epoch=200 train_loss=0.3402 train_acc=89.32 test_loss=1.4057 test_acc=67.01 lr=0.000000 Final metrics (resnet56-cifar100): test_acc_resnet56-cifar100=67.14 ### mobilenetv2-fmnist (scripts/mobilenetv2_fmnist.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.3357 train_acc=87.61 test_loss=0.3285 test_acc=88.03 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=0.3142
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_norm.py",
4 "start_line": 31,
5 "end_line": 44,
6 "content": "class CustomNorm(nn.Module):\n \"\"\"Adaptive Norm with Skip Scaling (ANSS).\"\"\"\n def __init__(self, num_features):\n super().__init__()\n g = min(32, num_features)\n while num_features % g: g -= 1\n self.norm = nn.GroupNorm(g, num_features)\n self.gate = nn.Parameter(torch.full((num_features,), 2.0))\n self.scale = nn.Parameter(torch.ones(num_features))\n self.weight = nn.Parameter(torch.ones(num_features))\n self.bias = nn.Parameter(torch.zeros(num_features))\n def forward(self, x):\n n = self.norm(x)\n g = torch.sigmoid(self.gate.view(1,-1,1,1))\n return (g*n + (1-g)*(self.scale.view(1,-1,1,1)*x)) * self.weight.view(1,-1,1,1) + self.bias.view(1,-1,1,1)"
7}OK: Replaced lines 31..44 with 15 line(s) in pytorch-vision/custom_norm.py. Editable range: 31–45.
[Current file: pytorch-vision/custom_norm.py | editable: 31–45 | total: 431 lines]
28| # ============================================================================
29|
30| # -- EDITABLE REGION START (lines 31-45) --------------------------------------
31 class CustomNorm(nn.Module):
32 """Adaptive Norm with Skip Scaling (ANSS)."""
33 def __init__(self, num_features):
34 super().__init__()
35 g = min(32, num_features)
36 while num_features % g: g -= 1
37 self.norm = nn.GroupNorm(g, num_features)
38 self.gate = nn.Parameter(torch.full((num_features,), 2.0))
39 self.scale = nn.Parameter(torch.ones(num_features))
40 self.weight = nn.Parameter(torch.ones(num_features))
41 self.bias = nn.Parameter(torch.zeros(num_features))
42 def forward(self,
[Test #2] (1 test remaining; call submit(n=N) to choose which test result to submit as final) ### resnet56-cifar100 (scripts/resnet56_cifar100.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=1.1686 train_acc=65.59 test_loss=1.4942 test_acc=58.22 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=1.0153 train_acc=69.42 test_loss=1.4418 test_acc=60.13 lr=0.005450 TRAIN_METRICS: epoch=180 train_loss=0.8592 train_acc=73.62 test_loss=1.4140 test_acc=60.91 lr=0.002447 TRAIN_METRICS: epoch=190 train_loss=0.7275 train_acc=77.68 test_loss=1.3798 test_acc=63.18 lr=0.000616 TRAIN_METRICS: epoch=200 train_loss=0.6706 train_acc=79.48 test_loss=1.3792 test_acc=63.58 lr=0.000000 Final metrics (resnet56-cifar100): test_acc_resnet56-cifar100=63.70 ### mobilenetv2-fmnist (scripts/mobilenetv2_fmnist.sh) Training progress (last epochs): TRAIN_METRICS: epoch=160 train_loss=0.4439 train_acc=83.04 test_loss=0.4286 test_acc=83.90 lr=0.009549 TRAIN_METRICS: epoch=170 train_loss=0.4219 t
Tool Input
1{
2 "n": 1
3}Undo complete: Restored: pytorch-vision/custom_norm.py
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_norm.py",
4 "start_line": 31,
5 "end_line": 45,
6 "content": "class CustomNorm(nn.Module):\n \"\"\"Gated Residual Normalization (GRN): GroupNorm + residual skip.\"\"\"\n def __init__(self, num_features):\n super().__init__()\n g = min(32, num_features)\n while num_features % g: g -= 1\n self.norm = nn.GroupNorm(g, num_features)\n self.gate = nn.Parameter(torch.full((num_features,), 2.0))\n self.weight = nn.Parameter(torch.ones(num_features))\n self.bias = nn.Parameter(torch.zeros(num_features))\n def forward(self, x):\n n = self.norm(x)\n g = torch.sigmoid(self.gate.view(1,-1,1,1))\n return (g*n + (1-g)*x) * self.weight.view(1,-1,1,1) + self.bias.view(1,-1,1,1)"
7}ERROR: Lines 31..45 exceed the editable range. You may only edit lines 31–44.
[Current file: pytorch-vision/custom_norm.py | editable: 31–44 | total: 430 lines]
28| # ============================================================================
29|
30| # -- EDITABLE REGION START (lines 31-45) --------------------------------------
31 class CustomNorm(nn.Module):
32 """Gated Residual Normalization (GRN): GroupNorm + residual skip."""
33 def __init__(self, num_features):
34 super().__init__()
35 g = min(32, num_features)
36 while num_features % g: g -= 1
37 self.norm = nn.GroupNorm(g, num_features)
38 self.gate = nn.Parameter(torch.full((num_features,), 2.0))
39 self.weight = nn.Parameter(torch.ones(num_features))
40 self.bias = nn.Parameter(torch.zeros(num_features))
41 def forward(self, x):
42 n = self.norm(x)
43 g = torch.sig
Tool Input
1{
2 "op": "replace",
3 "filename": "pytorch-vision/custom_norm.py",
4 "start_line": 31,
5 "end_line": 44,
6 "content": "class CustomNorm(nn.Module):\n \"\"\"Channel-Adaptive Normalization (CAN).\"\"\"\n def __init__(self, num_features):\n super().__init__()\n g = min(32, num_features)\n while num_features % g: g -= 1\n self.norm = nn.GroupNorm(g, num_features)\n self.adapt_weight = nn.Parameter(torch.zeros(num_features))\n self.adapt_bias = nn.Parameter(torch.zeros(num_features))\n self.weight = nn.Parameter(torch.ones(num_features))\n self.bias = nn.Parameter(torch.zeros(num_features))\n def forward(self, x):\n n = self.norm(x)\n pooled = x.abs().mean(dim=(2,3), keepdim=True)\n scale = 1 + 0.5 * torch.tanh(self.adapt_weight.view(1,-1,1,1)*pooled + self.adapt_bias.view(1,-1,1,1))\n return (n * scale) * self.weight.view(1,-1,1,1) + self.bias.view(1,-1,1,1)"
7}OK: Replaced lines 31..44 with 16 line(s) in pytorch-vision/custom_norm.py. Editable range: 31–46.
[Current file: pytorch-vision/custom_norm.py | editable: 31–46 | total: 432 lines]
28| # ============================================================================
29|
30| # -- EDITABLE REGION START (lines 31-45) --------------------------------------
31 class CustomNorm(nn.Module):
32 """Channel-Adaptive Normalization (CAN)."""
33 def __init__(self, num_features):
34 super().__init__()
35 g = min(32, num_features)
36 while num_features % g: g -= 1
37 self.norm = nn.GroupNorm(g, num_features)
38 self.adapt_weight = nn.Parameter(torch.zeros(num_features))
39 self.adapt_bias = nn.Parameter(torch.zeros(num_features))
40 self.weight = nn.Parameter(torch.ones(num_features))
41 self.bias = nn.Parameter(torch.zeros(num_features))
42 def forward(s