cv-data-augmentation
Description
CV Data Augmentation Strategy Design
Research Question
Design a novel data augmentation strategy for image classification that improves test accuracy across different architectures and datasets.
Background
Data augmentation is a key regularization technique for training deep neural networks on limited data. By applying transformations to training images, augmentation increases effective dataset diversity and reduces overfitting. Classic and modern methods include:
- Standard (baseline): RandomCrop with padding + RandomHorizontalFlip — the minimal CIFAR augmentation
- Cutout (DeVries & Taylor, 2017): Randomly masks square regions, forcing the network to use broader spatial context
- RandAugment (Cubuk et al., 2020): Applies N randomly selected operations at uniform magnitude M, avoiding expensive search
- TrivialAugmentWide (Mueller & Hutter, 2021): Single random operation with random magnitude per image, zero hyperparameters
These methods make different choices about geometric, photometric, and masking transforms, and may behave differently across datasets and model families.
What You Can Modify
The build_train_transform(config) function (lines 165-194) in custom_augment.py. This function receives a config dict and must return a transforms.Compose pipeline that includes ToTensor() and Normalize().
You can modify:
- Which geometric transforms to apply (crop, flip, rotation, affine, perspective)
- Which photometric transforms to apply (color jitter, equalize, posterize, solarize)
- Erasing/masking strategies (cutout, random erasing)
- Automated augmentation policies (AutoAugment, RandAugment, TrivialAugment)
- Custom transform classes (defined inside the function)
- The ordering and composition of transforms
- Any transform that operates on PIL images (before ToTensor) or tensors (after ToTensor)
The config dict provides: img_size (32), mean (tuple), std (tuple), dataset ('cifar10' or 'cifar100'). You may use dataset-specific augmentation if desired.
Important: The pipeline MUST include transforms.ToTensor() and transforms.Normalize(config['mean'], config['std']) to produce properly normalized tensors for the model.
Evaluation
- Metric: Best test accuracy (%, higher is better)
- Architectures & datasets:
- ResNet-20 on CIFAR-10 (shallow residual, 10 classes)
- ResNet-56 on CIFAR-100 (deeper residual, 100 classes)
- MobileNetV2 on FashionMNIST (lightweight inverted-residual, 10 classes) — hidden, evaluated on final submission only
- Training: SGD (lr=0.1, momentum=0.9, wd=5e-4), cosine annealing, 200 epochs
- Weight init: Standard Kaiming normal (fixed)
Code
1"""CV Data Augmentation Benchmark.23Train vision models (ResNet, VGG, MobileNetV2) on CIFAR-10/100/FashionMNIST to evaluate4data augmentation strategies.56FIXED: Model architectures, weight initialization, test transform, data loading, training loop.7EDITABLE: build_train_transform() function.89Usage:10python custom_augment.py --arch resnet20 --dataset cifar10 --seed 4211"""1213import argparse14import math15import os
Results
| Model | Type | test acc resnet20-cifar10 ↑ | test acc resnet56-cifar100 ↑ | test acc mobilenetv2-fmnist ↑ |
|---|---|---|---|---|
| cutout | baseline | 93.720 | 74.540 | 94.720 |
| randaugment | baseline | 93.510 | 74.350 | 94.430 |
| trivialaugment | baseline | 93.360 | 74.710 | 94.240 |
| anthropic/claude-opus-4.6 | vanilla | 93.070 | 75.390 | 94.130 |
| deepseek-reasoner | vanilla | 92.450 | 73.110 | 93.810 |
| google/gemini-3.1-pro-preview | vanilla | 93.120 | 75.890 | 94.390 |
| openai/gpt-5.4 | vanilla | 92.990 | 73.950 | 94.920 |
| qwen/qwen3.6-plus | vanilla | 89.810 | 69.380 | 94.290 |
| anthropic/claude-opus-4.6 | agent | 93.530 | 75.320 | 94.130 |
| deepseek-reasoner | agent | 92.810 | 73.870 | 94.260 |
| google/gemini-3.1-pro-preview | agent | 93.120 | 75.890 | 94.390 |
| openai/gpt-5.4 | agent | 93.170 | 74.860 | 94.740 |
| qwen/qwen3.6-plus | agent | 89.810 | 69.380 | 94.290 |