cv-multitask-loss

Computer Visionpytorch-visionrigorous codebase

Description

CV Multi-Task Loss Combination Strategy Design

Research Question

Design a novel multi-task loss combination strategy for jointly training fine-grained (100-class) and coarse (20-superclass) classification on CIFAR-100 that maximizes fine-class test accuracy.

Background

CIFAR-100 contains 100 fine classes organized into 20 coarse superclasses. Training a model with two classification heads (fine + coarse) provides a natural multi-task learning setup where the coarse task acts as an auxiliary signal. The key challenge is how to combine the two losses effectively.

Classic approaches include:

  • Equal weighting: Simply sum the losses (baseline default)
  • Uncertainty weighting (Kendall et al., 2018): Learn task-specific uncertainty as log-variance parameters
  • Dynamic Weight Average (Liu et al., 2019): Weight tasks by their relative loss change rate
  • PCGrad (Yu et al., NeurIPS 2020): Project conflicting task gradients onto each other's normal plane to reduce gradient interference

The coarse labels encode semantic hierarchy. The task is to balance the auxiliary coarse signal against the primary fine-class objective across different architectures and training stages.

What You Can Modify

The MultiTaskLoss class (lines 195-216) in custom_mtl.py. This class receives individual task losses and must combine them into a single scalar loss.

You can modify:

  • The __init__ method: add learnable parameters (log-variances, weights, etc.)
  • The forward method: implement any combination strategy
  • Use epoch and total_epochs for curriculum/scheduling approaches
  • Add any auxiliary state (e.g., loss history buffers)

The forward method receives:

  • fine_loss: scalar tensor, cross-entropy for 100-class fine prediction
  • coarse_loss: scalar tensor, cross-entropy for 20-class coarse prediction
  • epoch: int, current epoch (0-indexed)
  • total_epochs: int, total number of training epochs

Note: The MultiTaskLoss parameters are included in the optimizer, so learnable parameters will be trained.

Evaluation

  • Metric: Best fine-class test accuracy (%, higher is better)
  • Architectures (all on CIFAR-100 with fine+coarse heads):
    • ResNet-20 (shallow residual network)
    • ResNet-56 (deeper residual network)
    • VGG-16-BN (deep non-residual with BatchNorm) — hidden, evaluated on final submission only
  • Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
  • Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip

Code

custom_mtl.py
EditableRead-only
1"""CV Multi-Task Loss Benchmark.
2
3Train vision models (ResNet, VGG) on CIFAR-100 with TWO classification heads
4(fine: 100 classes, coarse: 20 superclasses) to evaluate multi-task loss
5combination strategies.
6
7FIXED: Model architectures, data pipeline, training loop.
8EDITABLE: MultiTaskLoss class.
9
10Usage:
11 python custom_mtl.py --arch resnet20 --seed 42
12"""
13
14import argparse
15import math

Results

ModelTypetest acc resnet20-cifar100mt test acc resnet56-cifar100mt test acc vgg16bn-cifar100mt
dwabaseline67.96072.39073.780
pcgradbaseline64.31070.20074.170
uncertaintybaseline66.81070.94072.670
anthropic/claude-opus-4.6vanilla68.60072.66073.800
deepseek-reasonervanilla66.55071.76072.290
google/gemini-3.1-pro-previewvanilla68.48071.08074.080
openai/gpt-5.4vanilla69.11072.12074.360
qwen/qwen3.6-plusvanilla68.29072.38073.460
anthropic/claude-opus-4.6agent68.60072.66073.800
deepseek-reasoneragent67.10071.90072.750
google/gemini-3.1-pro-previewagent68.87072.64074.550
openai/gpt-5.4agent68.96072.37074.310
qwen/qwen3.6-plusagent68.71072.70074.010

Agent Conversations