cv-classification-loss

Computer Visionpytorch-visionrigorous codebase

Description

CV Classification Loss Function Design

Research Question

Design a novel classification loss function for deep convolutional neural networks that improves test accuracy across different architectures and datasets.

Background

The cross-entropy loss is the standard training objective for classification networks, but it has known limitations: it treats all misclassifications equally, assigns high confidence to correct predictions without margin, and does not adapt to training dynamics. Researchers have proposed alternatives:

Label Smoothing (Szegedy et al., 2016): Softens hard targets to prevent overconfidence, CE with targets = (1-eps)*one_hot + eps/C
Focal Loss (Lin et al., ICCV 2017): Down-weights easy examples via (1-pt)^gamma modulation
PolyLoss (Leng et al., ICLR 2022): Extends CE with polynomial correction terms, CE + eps*(1-pt)

However, these methods are either static or address only specific failure modes. There is room to design loss functions that combine multiple insights: confidence calibration, curriculum-style epoch adaptation, class-count awareness, or learned temperature scaling.

What You Can Modify

The compute_loss(logits, targets, config) function (lines 165-185) in custom_loss.py. This function receives raw logits, integer targets, and a config dict, and must return a differentiable scalar loss.

You can modify:

The loss formulation (cross-entropy variants, margin losses, etc.)
Confidence-based reweighting schemes
Epoch-dependent curriculum strategies using config['epoch'] and config['total_epochs']
Class-count-dependent behavior using config['num_classes']
Temperature or logit scaling
Auxiliary regularization terms (entropy, logit penalties, etc.)

The config dict provides: num_classes (int), epoch (int, 0-indexed), total_epochs (int).

Important: The evaluation loss (for test_loss reporting) always uses standard cross-entropy. Your loss function only affects training.

Evaluation

Metric: Best test accuracy (%, higher is better)
Architectures & datasets:
- ResNet-56 on CIFAR-100 (deep residual, 100 classes)
- VGG-16-BN on CIFAR-100 (deep non-residual with BatchNorm, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual, 10 classes) — hidden, evaluated on final submission only
Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip

Code

custom_loss.py

EditableRead-only

1"""CV Classification Loss Benchmark.
2
3Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST to evaluate
4classification loss function designs.
5
6FIXED: Model architectures, weight initialization, data pipeline, training loop.
7EDITABLE: compute_loss() function.
8
9Usage:
10    python custom_loss.py --arch resnet20 --dataset cifar10 --seed 42
11"""
12
13import argparse
14import math
15import os

Results

Model	Type	test acc resnet56-cifar100 ↑	test acc vgg16bn-cifar100 ↑	test acc mobilenetv2-fmnist ↑
focal_loss	baseline	71.670	74.180	94.140
label_smoothing	baseline	71.360	74.670	94.820
poly_loss	baseline	71.560	74.060	94.740
anthropic/claude-opus-4.6	vanilla	72.300	74.320	94.830
deepseek-reasoner	vanilla	72.660	74.680	94.680
google/gemini-3.1-pro-preview	vanilla	72.570	73.240	94.460
openai/gpt-5.4	vanilla	73.100	74.950	94.730
qwen/qwen3.6-plus	vanilla	50.420	49.840	89.820
anthropic/claude-opus-4.6	agent	72.340	74.320	94.830
deepseek-reasoner	agent	72.660	74.680	94.680
google/gemini-3.1-pro-preview	agent	72.570	73.240	94.460
openai/gpt-5.4	agent	73.100	74.950	94.730
qwen/qwen3.6-plus	agent	50.420	49.840	89.820

Agent Conversations

deepseek-reasoner

7 steps