optimization-bilevel

Optimizationpenalized-bilevel-gradient-descentrigorous codebase

Description

Optimization Bilevel

Research Question

Can you improve a fixed bilevel-optimization benchmark based on Shen and Chen's penalty-based bilevel gradient descent experiments by selecting a better supported method and tuning only paper-style strategy hyperparameters?

What You Can Modify

Edit only penalized-bilevel-gradient-descent/mlsbench/custom_strategy.py inside the editable block containing:

  1. get_toy_strategy()
  2. get_hyperclean_strategy(net)

These functions may only choose among the supported methods already implemented in the fixed driver:

  • Toy mode: v_pbgd, g_pbgd
  • Data hyper-cleaning mode: v_pbgd, g_pbgd, rhg, t_rhg

You should only change strategy-level choices already present in the paper/codebase, such as:

  • method selection
  • learning rates
  • penalty schedule (gamma_init, gamma_max, gamma_argmax_step)
  • inner / outer iteration counts
  • RHG truncation depth (K) and inner-loop length (T)

Do not rewrite the driver, dataset split, pollution protocol, metrics, or model architectures.

Fixed Setup

Toy / Numerical Verification

  • Problem definition follows Section 5.1 / 6.1 of the paper
  • x is projected to [0, 3]
  • 1000 random initial points are sampled as in the official toy script
  • Primary metric: convergence_steps
  • Secondary metrics: success_rate, final_residual, runtime_sec

Data Hyper-Cleaning

  • MNIST split: 5000 train / 5000 validation / 10000 test
  • Pollution rate: 50%
  • Pollution logic follows the released official code
  • Models: linear classifier and 2-layer MLP (784 -> 300 -> 10, sigmoid hidden layer)
  • Primary metric: test_accuracy
  • Secondary metrics: f1_score, cleaner precision / recall, runtime to best accuracy

Reference Files

The following official source files are provided read-only for fidelity:

  • penalized-bilevel-gradient-descent/V-PBGD/toy/toy.py
  • penalized-bilevel-gradient-descent/V-PBGD/data-hyper-cleaning/data_hyper_clean.py
  • penalized-bilevel-gradient-descent/G-PBGD/data_hyper_clean_gpbgd.py
  • penalized-bilevel-gradient-descent/RHG/data_hyper_clean_rhg.py
  • penalized-bilevel-gradient-descent/RHG/hypergrad/hypergradients.py

Evaluation

The task runs three benchmark commands:

  1. toy-convergence
  2. hyperclean-linear
  3. hyperclean-mlp

Each command prints structured TRAIN_METRICS and FINAL_METRICS lines. The parser records the final metrics separately for each command label.

Code

custom_strategy.py
EditableRead-only
1"""Optimization-bilevel scaffold for MLS-Bench.
2
3The fixed driver reproduces the numerical verification and data hyper-cleaning
4experiments from Shen and Chen, "On Penalty-based Bilevel Gradient Descent
5Method" (ICML 2023 / Mathematical Programming 2025) while exposing only the
6method choice and official hyperparameters as editable strategy hooks.
7"""
8
9from __future__ import annotations
10
11import argparse
12import json
13import math
14import os
15import random

Additional context files (read-only):

  • penalized-bilevel-gradient-descent/RHG/hypergrad/hypergradients.py

Results

ModelTypeconvergence steps toy convergence final residual toy convergence final projected grad toy convergence success rate toy convergence test accuracy hyperclean linear f1 score hyperclean linear cleaner precision hyperclean linear cleaner recall hyperclean linear test accuracy hyperclean mlp f1 score hyperclean mlp cleaner precision hyperclean mlp cleaner recall hyperclean mlp
g_pbgdbaseline303.6860.0810.0001.00089.83780.6290.8390.77692.38090.8200.8890.928
rhgbaseline260.7120.0300.0001.00084.63389.5470.8320.96984.79089.3390.8220.979
t_rhgbaseline260.7120.0300.0001.00084.61389.0590.8280.96484.79089.3480.8240.976
v_pbgdbaseline260.7120.0300.0001.00090.09791.7220.8850.95291.48092.0500.8870.956
anthropic/claude-opus-4.6vanilla147.2980.0330.0001.000----92.41091.0300.8900.932
deepseek-reasonervanilla261.2560.0300.0001.00090.08091.8110.8840.95591.48092.0500.8870.956
google/gemini-3.1-pro-previewvanilla147.2980.0330.0001.00090.09091.7710.8820.95792.19090.8520.8880.930
openai/gpt-5.4-provanilla261.2560.0300.0001.00090.08091.8110.8840.95592.38090.8200.8890.928
qwen3.6-plus:freevanilla20000.0000.54716.6830.00089.50060.4730.8310.47591.45091.1030.8860.938
anthropic/claude-opus-4.6agent147.2980.0330.0001.00090.10091.7820.8830.95692.64091.7380.8900.946
deepseek-reasoneragent7363.5110.1513.8120.63489.58091.3930.8650.96891.37091.2230.8860.940
google/gemini-3.1-pro-previewagent3374.7880.1423.6630.835----93.19089.6310.8900.902
openai/gpt-5.4-proagent261.2560.0300.0001.00090.08091.8110.8840.95592.55091.1540.8900.934
qwen3.6-plus:freeagent261.2560.0300.0001.000----92.38090.8200.8890.928
qwen3.6-plus:freeagent129.1690.0670.0001.00090.08091.8110.8840.95592.38090.8200.8890.928

Agent Conversations