optimization-nas
Description
Sample-Efficient Neural Architecture Search
Objective
Design and implement a novel sample-efficient NAS optimizer that discovers
high-performing architectures in the NAS-Bench-201 search space under a
strict query budget. Your code goes in the NASOptimizer class in
custom_nas_search.py. Three reference implementations (Random Search, REA,
and a BANANAS-style predictor-guided search) are provided as read-only.
Research Question
With only K = 30 architecture evaluations, how can a search strategy maximize the expected accuracy of the best-found architecture?
This is the regime in which real-world NAS is actually hard: the full benchmark contains 15,625 architectures, but the agent can only query 30 of them, so naïve enumeration is impossible and algorithmic differences are load-bearing. Sample-efficient NAS has been studied by BANANAS (White et al., AAAI 2021), NPENAS (Wang et al., TNNLS 2022), NAS-Bench-Suite (White et al., 2022) and consistently shows a measurable gap between random search, regularized evolution, and predictor-guided methods at K ≤ 50.
Search Space
- NAS-Bench-201 cell: 4 nodes, 6 edges, 5 operations per edge
- Operations:
skip_connect, none, nor_conv_3x3, nor_conv_1x1, avg_pool_3x3 - 5^6 = 15,625 architectures total
- An architecture is represented as a list of 6 integers in
[0, 4]
Evaluation Protocol
- Datasets: CIFAR-10, CIFAR-100, ImageNet16-120 (three separate settings)
- Query budget:
NAS_EPOCHS = 30validation queries per dataset per seed (the harness enforces this; exceeding it aborts the run) - Metric: test accuracy of the final returned architecture on the NAS-Bench-201 test split (one extra query at the end, not counted against the budget)
- Seeds:
{0, 1, 2, 3, 4}. Report mean ± std across seeds — at K = 30 variance is non-trivial and a strategy that happens to get lucky on seed 42 is not a real improvement.
What Counts as a Contribution
Acceptable research directions (this list is not exhaustive):
- Better acquisition functions: e.g. UCB/EI over a learned predictor, Thompson sampling, information-theoretic criteria.
- Better surrogate models: GPs on path-encoded architectures, GNN predictors, MLP ensembles, zero-cost proxy hybrids (Mellor et al., 2021; Abdelfattah et al., 2021).
- Smarter exploration–exploitation mixing: local search around Pareto front, portfolio methods, warm-started evolution.
- Encoding choices: adjacency vs path encoding (White et al., 2020 showed path encoding substantially improves predictor accuracy at low K).
What does not count:
- Increasing the effective budget (e.g. re-querying the same architecture,
wrapping queries, etc.). The harness counts every call to
api.query_val_accuracyand will terminate afterK = 30. - Hard-coding known good architectures from NAS-Bench-201 literature.
Baselines (all under the same K = 30 budget)
| Name | Strategy |
|---|---|
random_search | Uniform sampling over valid architectures |
rea | Regularized Evolution (Real et al., 2019) with tournament selection and 1-edge mutation |
bananas | Predictor-guided: MLP ensemble over path encodings, pick candidate with highest predicted val_acc (White et al., 2021) |
Code
1# Custom NAS optimizer for MLS-Bench (NAS-Bench-201, sample-efficient regime)2#3# EDITABLE section: NASOptimizer class — implement your search strategy.4# FIXED sections: everything else (search space, benchmark API, evaluation loop).5#6# The NAS-Bench-201 search space has 15625 architectures (5 ops, 6 edges).7# Evaluation is tabular — query the benchmark for any architecture's accuracy.8# No actual neural network training is needed.9#10# IMPORTANT: You have a STRICT budget of NAS_EPOCHS validation queries11# (default 30). The BenchmarkAPI enforces this and will raise12# BudgetExceededError if you exceed it. One final test query at the end is13# free and not counted against the budget.14import os15import sys
Results
No results yet.