cv-data-augmentation

Computer Visionpytorch-visionrigorous codebase

Description

CV Data Augmentation Strategy Design

Research Question

Design a novel data augmentation strategy for image classification that improves test accuracy across different architectures and datasets.

Background

Data augmentation is a key regularization technique for training deep neural networks on limited data. By applying transformations to training images, augmentation increases effective dataset diversity and reduces overfitting. Classic and modern methods include:

Standard (baseline): RandomCrop with padding + RandomHorizontalFlip — the minimal CIFAR augmentation
Cutout (DeVries & Taylor, 2017): Randomly masks square regions, forcing the network to use broader spatial context
RandAugment (Cubuk et al., 2020): Applies N randomly selected operations at uniform magnitude M, avoiding expensive search
TrivialAugmentWide (Mueller & Hutter, 2021): Single random operation with random magnitude per image, zero hyperparameters

These methods make different choices about geometric, photometric, and masking transforms, and may behave differently across datasets and model families.

What You Can Modify

The build_train_transform(config) function (lines 165-194) in custom_augment.py. This function receives a config dict and must return a transforms.Compose pipeline that includes ToTensor() and Normalize().

You can modify:

Which geometric transforms to apply (crop, flip, rotation, affine, perspective)
Which photometric transforms to apply (color jitter, equalize, posterize, solarize)
Erasing/masking strategies (cutout, random erasing)
Automated augmentation policies (AutoAugment, RandAugment, TrivialAugment)
Custom transform classes (defined inside the function)
The ordering and composition of transforms
Any transform that operates on PIL images (before ToTensor) or tensors (after ToTensor)

The config dict provides: img_size (32), mean (tuple), std (tuple), dataset ('cifar10' or 'cifar100'). You may use dataset-specific augmentation if desired.

Important: The pipeline MUST include transforms.ToTensor() and transforms.Normalize(config['mean'], config['std']) to produce properly normalized tensors for the model.

Evaluation

Metric: Best test accuracy (%, higher is better)
Architectures & datasets:
- ResNet-20 on CIFAR-10 (shallow residual, 10 classes)
- ResNet-56 on CIFAR-100 (deeper residual, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual, 10 classes) — hidden, evaluated on final submission only
Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
Weight init: Standard Kaiming normal (fixed)

Code

custom_augment.py

EditableRead-only

1"""CV Data Augmentation Benchmark.
2
3Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST to evaluate
4data augmentation strategies.
5
6FIXED: Model architectures, weight initialization, test transform, data loading, training loop.
7EDITABLE: build_train_transform() function.
8
9Usage:
10    python custom_augment.py --arch resnet20 --dataset cifar10 --seed 42
11"""
12
13import argparse
14import math
15import os

Results

Model	Type	test acc resnet20-cifar10 ↑	test acc resnet56-cifar100 ↑	test acc mobilenetv2-fmnist ↑
cutout	baseline	93.720	74.540	94.720
randaugment	baseline	93.510	74.350	94.430
trivialaugment	baseline	93.360	74.710	94.240
anthropic/claude-opus-4.6	vanilla	93.070	75.390	94.130
deepseek-reasoner	vanilla	92.450	73.110	93.810
google/gemini-3.1-pro-preview	vanilla	93.120	75.890	94.390
openai/gpt-5.4	vanilla	92.990	73.950	94.920
qwen/qwen3.6-plus	vanilla	89.810	69.380	94.290
anthropic/claude-opus-4.6	agent	93.530	75.320	94.130
deepseek-reasoner	agent	92.810	73.870	94.260
google/gemini-3.1-pro-preview	agent	93.120	75.890	94.390
openai/gpt-5.4	agent	93.170	74.860	94.740
qwen/qwen3.6-plus	agent	89.810	69.380	94.290

Agent Conversations

deepseek-reasoner

7 steps