cv-multitask-loss

Computer Visionpytorch-visionrigorous codebase

Description

CV Multi-Task Loss Combination Strategy Design

Research Question

Design a novel multi-task loss combination strategy for jointly training fine-grained (100-class) and coarse (20-superclass) classification on CIFAR-100 that maximizes fine-class test accuracy.

Background

CIFAR-100 contains 100 fine classes organized into 20 coarse superclasses. Training a model with two classification heads (fine + coarse) provides a natural multi-task learning setup where the coarse task acts as an auxiliary signal. The key challenge is how to combine the two losses effectively.

Classic approaches include:

Equal weighting: Simply sum the losses (baseline default)
Uncertainty weighting (Kendall et al., 2018): Learn task-specific uncertainty as log-variance parameters
Dynamic Weight Average (Liu et al., 2019): Weight tasks by their relative loss change rate
PCGrad (Yu et al., NeurIPS 2020): Project conflicting task gradients onto each other's normal plane to reduce gradient interference

The coarse labels encode semantic hierarchy. The task is to balance the auxiliary coarse signal against the primary fine-class objective across different architectures and training stages.

What You Can Modify

The MultiTaskLoss class (lines 195-216) in custom_mtl.py. This class receives individual task losses and must combine them into a single scalar loss.

You can modify:

The __init__ method: add learnable parameters (log-variances, weights, etc.)
The forward method: implement any combination strategy
Use epoch and total_epochs for curriculum/scheduling approaches
Add any auxiliary state (e.g., loss history buffers)

The forward method receives:

fine_loss: scalar tensor, cross-entropy for 100-class fine prediction
coarse_loss: scalar tensor, cross-entropy for 20-class coarse prediction
epoch: int, current epoch (0-indexed)
total_epochs: int, total number of training epochs

Note: The MultiTaskLoss parameters are included in the optimizer, so learnable parameters will be trained.

Evaluation

Metric: Best fine-class test accuracy (%, higher is better)
Architectures (all on CIFAR-100 with fine+coarse heads):
- ResNet-20 (shallow residual network)
- ResNet-56 (deeper residual network)
- VGG-16-BN (deep non-residual with BatchNorm) — hidden, evaluated on final submission only
Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip

Code

custom_mtl.py

EditableRead-only

1"""CV Multi-Task Loss Benchmark.
2
3Train vision models (ResNet, VGG) on CIFAR-100 with TWO classification heads
4(fine: 100 classes, coarse: 20 superclasses) to evaluate multi-task loss
5combination strategies.
6
7FIXED: Model architectures, data pipeline, training loop.
8EDITABLE: MultiTaskLoss class.
9
10Usage:
11    python custom_mtl.py --arch resnet20 --seed 42
12"""
13
14import argparse
15import math

Results

Model	Type	test acc resnet20-cifar100mt ↑	test acc resnet56-cifar100mt ↑	test acc vgg16bn-cifar100mt ↑
dwa	baseline	67.960	72.390	73.780
pcgrad	baseline	64.310	70.200	74.170
uncertainty	baseline	66.810	70.940	72.670
anthropic/claude-opus-4.6	vanilla	68.600	72.660	73.800
deepseek-reasoner	vanilla	66.550	71.760	72.290
google/gemini-3.1-pro-preview	vanilla	68.480	71.080	74.080
openai/gpt-5.4	vanilla	69.110	72.120	74.360
qwen/qwen3.6-plus	vanilla	68.290	72.380	73.460
anthropic/claude-opus-4.6	agent	68.600	72.660	73.800
deepseek-reasoner	agent	67.100	71.900	72.750
google/gemini-3.1-pro-preview	agent	68.870	72.640	74.550
openai/gpt-5.4	agent	68.960	72.370	74.310
qwen/qwen3.6-plus	agent	68.710	72.700	74.010

Agent Conversations

deepseek-reasoner

9 steps