cv-sample-weighting

Computer Visionpytorch-visionrigorous codebase

Description

CV Sample Reweighting Strategy Design

Research Question

Design a novel sample reweighting strategy for class-imbalanced image classification that improves test accuracy on long-tail distributed datasets across different architectures and imbalance ratios.

Background

Real-world datasets often exhibit long-tail class distributions where a few head classes dominate while many tail classes have very few samples. Standard training with uniform loss weighting biases the model toward frequent classes, degrading performance on rare ones. Sample reweighting assigns per-class weights to the cross-entropy loss to counteract this imbalance. Classic approaches include:

  • Inverse frequency: weight[c] = total / (C * count[c]) — directly compensates for imbalance
  • Effective number (Cui et al., CVPR 2019): models data overlap using E_n = (1 - beta^n) / (1 - beta)
  • Square-root inverse: weight[c] = 1/sqrt(count[c]) — a gentler smoothed variant

These methods define different mappings from class frequency to loss weight and may behave differently across datasets and imbalance regimes.

What You Can Modify

The compute_class_weights(class_counts, num_classes, config) function (lines 164-195) in custom_weighting.py. This function receives per-class sample counts and must return a weight tensor for nn.CrossEntropyLoss(weight=...).

You can modify:

  • The functional form mapping class counts to weights (inverse, power-law, logarithmic, piecewise, etc.)
  • Use of the config dict: imbalance_ratio, dataset, arch, total_samples
  • Normalization strategy (sum to C, sum to 1, unnormalized, etc.)
  • Any pure-computation logic (no access to training data or model parameters)

Evaluation

  • Metric: Best test accuracy (%, higher is better) on the balanced test set
  • Benchmarks (all long-tail imbalanced):
    • ResNet-32 on CIFAR-10-LT (imbalance ratio = 100, 10 classes)
    • ResNet-32 on CIFAR-100-LT (imbalance ratio = 100, 100 classes)
    • VGG-16-BN on CIFAR-100-LT (imbalance ratio = 50, 100 classes) — hidden, evaluated on final submission only
  • Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
  • Data augmentation: RandomCrop(32, pad=4) + RandomHorizontalFlip

Code

custom_weighting.py
EditableRead-only
1"""CV Sample Reweighting Benchmark.
2
3Train vision models (ResNet-32, VGG-16-BN) on long-tail imbalanced CIFAR
4to evaluate sample reweighting strategies for class-imbalanced classification.
5
6FIXED: Model architectures, imbalanced dataset creation, data pipeline, training loop.
7EDITABLE: compute_class_weights() function.
8
9Usage:
10 python custom_weighting.py --arch resnet32 --dataset cifar10 --imbalance-ratio 100 --seed 42
11"""
12
13import argparse
14import math
15import os

Results

ModelTypetest acc resnet32-cifar10lt test acc resnet32-cifar100lt test acc vgg16bn-cifar100lt
balanced_softmaxbaseline73.59039.42045.360
effective_numberbaseline73.13040.16044.820
inverse_freqbaseline71.79035.55043.910
anthropic/claude-opus-4.6vanilla72.10039.37045.330
deepseek-reasonervanilla72.61039.45047.060
google/gemini-3.1-pro-previewvanilla73.46037.85043.360
openai/gpt-5.4vanilla72.70039.53046.200
qwen/qwen3.6-plusvanilla72.14039.87047.230
anthropic/claude-opus-4.6agent72.10039.37045.330
deepseek-reasoneragent72.56040.48047.260
google/gemini-3.1-pro-previewagent73.46037.85043.360
openai/gpt-5.4agent74.74038.69045.970
qwen/qwen3.6-plusagent72.14039.87047.230

Agent Conversations