optimization-gradient-compression

Optimizationpytorch-visionrigorous codebase

Description

Gradient Compression for Communication-Efficient Distributed Training

Research Question

Design a gradient compression operator that reduces communication cost in distributed training while maintaining convergence quality (test accuracy).

Background

In distributed data-parallel training, gradient communication is often the bottleneck. Workers compute local gradients, which must be aggregated (e.g., via all-reduce) before the optimizer step. Gradient compression reduces the volume of data communicated by applying lossy compression to gradients before transmission.

Three main families of compression exist:

  • Sparsification: Keep only a subset of gradient elements (e.g., TopK selects the largest magnitudes)
  • Quantization: Reduce the precision of gradient values (e.g., QSGD uses stochastic rounding to discrete levels)
  • Low-rank approximation: Approximate gradient matrices with low-rank factors (e.g., PowerSGD)

A key challenge is that naive compression introduces bias or variance that degrades convergence. Error feedback (accumulating compression residuals for the next iteration) is a widely-used technique to correct this.

Task

Modify the Compressor class in custom_compressor.py. Your compressor must implement:

  • __init__(self, compress_ratio): Initialize with a target compression ratio (0.01 = 100x compression)
  • compress(self, tensor, name): Compress a gradient tensor, returning (compressed_tensors, ctx)
  • decompress(self, compressed_tensors, ctx): Reconstruct the gradient

The compressor may maintain internal state (e.g., error feedback residuals) across calls. The name parameter identifies parameters for per-parameter state tracking.

Interface

class Compressor:
    def __init__(self, compress_ratio=0.01): ...
    def compress(self, tensor, name) -> (list[Tensor], ctx): ...
    def decompress(self, compressed_tensors, ctx) -> Tensor: ...
  • compress_ratio: Fraction of gradient elements/information to retain (0.01 = keep 1%)
  • compressed_tensors: List of tensors that would be communicated over the network
  • ctx: Local context (not communicated) needed for decompression
  • The decompressed tensor must have the same shape as the original input

Evaluation

Trained and evaluated on three settings with 100x compression (compress_ratio=0.01):

  • ResNet-20 / CIFAR-10 (0.27M params): Small model, standard benchmark
  • VGG-11-BN / CIFAR-100 (9.8M params): Larger model, harder 100-class problem
  • ResNet-56 / CIFAR-10 (0.85M params): Deeper model, tests scalability

Metric: best test accuracy (higher is better). All settings use SGD with momentum, cosine LR schedule, and 200 training epochs.

Code

custom_compressor.py
EditableRead-only
1"""Gradient Compression for Communication-Efficient Distributed Training.
2
3Self-contained benchmark: trains standard vision models on CIFAR datasets
4using data-parallel SGD with a pluggable gradient compressor.
5
6The script simulates distributed training on a single node by:
71. Computing gradients normally
82. Applying compress() -> decompress() to each gradient (simulating communication)
93. Using the decompressed gradient for the optimizer step
10
11This faithfully measures the effect of gradient compression on convergence
12quality, which is the core ML-science question, without requiring multi-node
13infrastructure.
14"""
15

Results

ModelTypebest acc resnet20-cifar10 test loss resnet20-cifar10 best acc vgg11-cifar100 test loss vgg11-cifar100 best acc resnet56-cifar10 test loss resnet56-cifar10
qsgdbaseline90.4070.31647.6832.04494.0100.261
signsgdbaseline92.5170.27270.7671.45194.1430.257
topk_efbaseline92.2470.27670.2431.47093.8500.258
anthropic/claude-opus-4.6vanilla--70.3701.49294.1830.240
deepseek-reasonervanilla--70.1171.48193.9250.249
google/gemini-3.1-pro-previewvanilla--70.3701.45994.1630.244
openai/gpt-5.4-provanilla92.5800.27770.4001.44494.0730.256
qwen3.6-plus:freevanilla90.7630.30666.9571.71692.1330.295
anthropic/claude-opus-4.6agent92.7170.26970.9731.43794.2000.244
deepseek-reasoneragent------
google/gemini-3.1-pro-previewagent--70.4601.40294.0770.261
openai/gpt-5.4-proagent92.7870.26670.7201.45094.1170.257
qwen3.6-plus:freeagent--67.5271.70693.5200.266

Agent Conversations