optimization-nas

Optimizationnaslibrigorous codebase

Description

Sample-Efficient Neural Architecture Search

Objective

Design and implement a novel sample-efficient NAS optimizer that discovers high-performing architectures in the NAS-Bench-201 search space under a strict query budget. Your code goes in the NASOptimizer class in custom_nas_search.py. Three reference implementations (Random Search, REA, and a BANANAS-style predictor-guided search) are provided as read-only.

Research Question

With only K = 30 architecture evaluations, how can a search strategy maximize the expected accuracy of the best-found architecture?

This is the regime in which real-world NAS is actually hard: the full benchmark contains 15,625 architectures, but the agent can only query 30 of them, so naïve enumeration is impossible and algorithmic differences are load-bearing. Sample-efficient NAS has been studied by BANANAS (White et al., AAAI 2021), NPENAS (Wang et al., TNNLS 2022), NAS-Bench-Suite (White et al., 2022) and consistently shows a measurable gap between random search, regularized evolution, and predictor-guided methods at K ≤ 50.

Search Space

NAS-Bench-201 cell: 4 nodes, 6 edges, 5 operations per edge
Operations: skip_connect, none, nor_conv_3x3, nor_conv_1x1, avg_pool_3x3
5^6 = 15,625 architectures total
An architecture is represented as a list of 6 integers in [0, 4]

Evaluation Protocol

Datasets: CIFAR-10, CIFAR-100, ImageNet16-120 (three separate settings)
Query budget: NAS_EPOCHS = 30 validation queries per dataset per seed (the harness enforces this; exceeding it aborts the run)
Metric: test accuracy of the final returned architecture on the NAS-Bench-201 test split (one extra query at the end, not counted against the budget)
Seeds: {0, 1, 2, 3, 4}. Report mean ± std across seeds — at K = 30 variance is non-trivial and a strategy that happens to get lucky on seed 42 is not a real improvement.

What Counts as a Contribution

Acceptable research directions (this list is not exhaustive):

Better acquisition functions: e.g. UCB/EI over a learned predictor, Thompson sampling, information-theoretic criteria.
Better surrogate models: GPs on path-encoded architectures, GNN predictors, MLP ensembles, zero-cost proxy hybrids (Mellor et al., 2021; Abdelfattah et al., 2021).
Smarter exploration–exploitation mixing: local search around Pareto front, portfolio methods, warm-started evolution.
Encoding choices: adjacency vs path encoding (White et al., 2020 showed path encoding substantially improves predictor accuracy at low K).

What does not count:

Increasing the effective budget (e.g. re-querying the same architecture, wrapping queries, etc.). The harness counts every call to api.query_val_accuracy and will terminate after K = 30.
Hard-coding known good architectures from NAS-Bench-201 literature.

Baselines (all under the same K = 30 budget)

Name	Strategy
`random_search`	Uniform sampling over valid architectures
`rea`	Regularized Evolution (Real et al., 2019) with tournament selection and 1-edge mutation
`bananas`	Predictor-guided: MLP ensemble over path encodings, pick candidate with highest predicted val_acc (White et al., 2021)

Code

custom_nas_search.py

EditableRead-only

1# Custom NAS optimizer for MLS-Bench (NAS-Bench-201, sample-efficient regime)
2#
3# EDITABLE section: NASOptimizer class — implement your search strategy.
4# FIXED sections: everything else (search space, benchmark API, evaluation loop).
5#
6# The NAS-Bench-201 search space has 15625 architectures (5 ops, 6 edges).
7# Evaluation is tabular — query the benchmark for any architecture's accuracy.
8# No actual neural network training is needed.
9#
10# IMPORTANT: You have a STRICT budget of NAS_EPOCHS validation queries
11# (default 30). The BenchmarkAPI enforces this and will raise
12# BudgetExceededError if you exceed it. One final test query at the end is
13# free and not counted against the budget.
14import os
15import sys

Results

No results yet.