Tasks
Browse all benchmark tasks. Click a task to see its description, leaderboard, and agent conversation logs.
Categories
170 tasks
| Task ↑ | Category ↕ | Packages | Baselines ↕ | Envs ↕ | Logs |
|---|---|---|---|---|---|
| agent-tool-reasoning | Agent Reasoning | stabletoolbench | 9 | 1 | - |
| ai4bio-antibody-binding-scoring | AI for Biology | AbBiBench | 3 | 3 | - |
| ai4bio-antibody-cdr-design | AI for Biology | chimera-bench | 3 | 3 | - |
| ai4bio-mutation-effect-prediction | AI for Biology | ProteinGym | 3 | 3 | - |
| ai4bio-protein-function-prediction | AI for Biology | DeepProtein | 3 | 3 | - |
| ai4bio-protein-inverse-folding | AI for Biology | ProteinInvBench | 3 | 3 | - |
| ai4bio-protein-structure-repr | AI for Biology | ProteinWorkshop | 3 | 3 | - |
| ai4sci-climate-emulation | AI for Science | ClimSim | 5 | 3 | - |
| ai4sci-inverse-diffusion-algo | AI for Science | InverseBench | 3 | 3 | - |
| ai4sci-mol-property-prediction | AI for Science | Uni-Mol | 3 | 3 | - |
| ai4sci-pla-binding-affinity | AI for Science | EHIGN_PLA | 4 | 3 | - |
| ai4sci-vs-contrastive-scoring | AI for Science | HypSeek | 3 | 4 | - |
| ai4sci-weather-forecast-aggregation | AI for Science | ClimaX | 4 | 3 | - |
| ar-video-kv-temporal-policy | Autonomous Robotics | FAR | 4 | 3 | - |
| causal-discovery-discrete | Causal Inference | causal-bnlearn | 5 | 5 | ✓ |
| causal-observational-linear-gaussian | Causal Inference | causal-learn | 4 | 5 | ✓ |
| causal-observational-linear-non-gaussian | Causal Inference | causal-learn | 3 | 3 | ✓ |
| causal-observational-nonlinear | Causal Inference | causal-learn | 4 | 3 | ✓ |
| causal-treatment-effect | Causal Inference | scikit-learn | 6 | 3 | ✓ |
| cv-3dgs-densification | Computer Vision | gsplat | 4 | 4 | ✓ |
| cv-classification-loss | Computer Vision | pytorch-vision | 3 | 3 | ✓ |
| cv-data-augmentation | Computer Vision | pytorch-vision | 3 | 3 | ✓ |
| cv-dbm-sampler | Computer Vision | dbim-codebase | 3 | 2 | ✓ |
| cv-dbm-scheduler | Computer Vision | dbim-codebase | 4 | 2 | ✓ |
| cv-diffusion-architecture | Computer Vision | diffusers-main | 3 | 3 | ✓ |
| cv-diffusion-cfg | Computer Vision | CFGpp-main | 3 | 3 | ✓ |
| cv-diffusion-conditioning | Computer Vision | diffusers-main | 3 | 3 | ✓ |
| cv-diffusion-efficiency | Computer Vision | CFGpp-main | 3 | 3 | ✓ |
| cv-diffusion-prediction | Computer Vision | diffusers-main | 3 | 3 | ✓ |
| cv-flowmaps-training | Computer Vision | flow-maps | 3 | 3 | - |
| cv-meanflow-perceptual-loss | Computer Vision | alphaflow-main | 3 | 3 | ✓ |
| cv-meanflow-training | Computer Vision | alphaflow-main | 3 | 3 | - |
| cv-multitask-loss | Computer Vision | pytorch-vision | 3 | 3 | ✓ |
| cv-pooling-aggregation | Computer Vision | pytorch-vision | 3 | 3 | ✓ |
| cv-sample-weighting | Computer Vision | pytorch-vision | 3 | 3 | ✓ |
| cv-vae-loss | Computer Vision | diffusers-main | 3 | 3 | ✓ |
| dex-retarget | Other | dex-retargeting | 3 | 3 | - |
| dl-activation-function | Deep Learning | pytorch-vision | 3 | 3 | ✓ |
| dl-lr-schedule | Deep Learning | pytorch-vision | 3 | 3 | ✓ |
| dl-normalization | Deep Learning | pytorch-vision | 3 | 3 | ✓ |
| dl-regularization | Deep Learning | pytorch-vision | 3 | 3 | ✓ |
| dl-residual-connection | Deep Learning | pytorch-vision | 3 | 3 | ✓ |
| dl-weight-initialization | Deep Learning | pytorch-vision | 3 | 3 | ✓ |
| dlm-dkv-policy | Deep Learning | dLLM-cache | 6 | 4 | - |
| graph-generation | Graph Learning | pytorch-geometric | 5 | 3 | - |
| graph-graph-classification | Graph Learning | pytorch-geometric | 6 | 3 | - |
| graph-link-prediction | Graph Learning | pytorch-geometric-lp | 6 | 3 | - |
| graph-node-classification | Graph Learning | pytorch-geometric | 6 | 3 | - |
| graph-signal-propagation | Graph Learning | ChebNetII | 7 | 4 | - |
| graph-temporal | Graph Learning | BasicTS | 6 | 3 | - |
| humanoid-ppo-extractor | Other | humanoid-bench | 6 | 3 | - |
| jepa-planning | Deep Learning | eb_jepa | 3 | 3 | - |
| jepa-prediction-loss | Deep Learning | eb_jepa | 3 | 3 | - |
| jepa-regularizer | Deep Learning | eb_jepa | 3 | 3 | - |
| libero-lifelong | Reinforcement Learning | LIBERO | 3 | 1 | - |
| llm-algorithm-16Mqat | Language Models | llm-16m-qat-runtime | 3 | 3 | - |
| llm-dllm-demask-strategy | Language Models | LLaDA | 3 | 3 | ✓ |
| llm-hybrid-posttraining | Language Models | verl | 4 | 1 | - |
| llm-kv-adaptive-quantization | ML Systems | transformers-kv-lab | 5 | 3 | - |
| llm-kv-selection-budgeting | ML Systems | FastKV | 5 | 4 | - |
| llm-kv-structural-reduction | ML Systems | nanoGPT | 4 | 5 | - |
| llm-offline-rl | Language Models | LLaMA-Factory, MathRuler, alpaca_eval | 2 | 3 | ✓ |
| llm-pretrain-attention | Language Models | lm-evaluation-harness, nanoGPT | 3 | 2 | ✓ |
| llm-pretrain-bitlinear | Language Models | lm-evaluation-harness, nanoGPT | 3 | 2 | ✓ |
| llm-pretrain-embedding | Language Models | lm-evaluation-harness, nanoGPT | 3 | 2 | ✓ |
| llm-pretrain-kernel | Language Models | lm-evaluation-harness, nanoGPT | 3 | 2 | ✓ |
| llm-pretrain-linear-attention | Language Models | lm-evaluation-harness, nanoGPT | 3 | 2 | ✓ |
| llm-pretrain-loss | Language Models | lm-evaluation-harness, nanoGPT | 3 | 2 | ✓ |
| llm-pretrain-lr-schedule | Language Models | lm-evaluation-harness, nanoGPT | 3 | 2 | ✓ |
| llm-pretrain-mlp | Language Models | lm-evaluation-harness, nanoGPT | 3 | 2 | ✓ |
| llm-pretrain-normalization | Language Models | lm-evaluation-harness, nanoGPT | 3 | 2 | ✓ |
| llm-pretrain-optimizer | Language Models | lm-evaluation-harness, nanoGPT | 3 | 2 | ✓ |
| llm-pretrain-residual | Language Models | lm-evaluation-harness, nanoGPT | 4 | 2 | ✓ |
| llm-ptq-algorithm | Language Models | gptq | 3 | 3 | - |
| llm-qat-algorithm | Language Models | gptq | 3 | 3 | - |
| llm-rl-advantage | Language Models | verl | 3 | 1 | - |
| llm-rl-advantage-1.5b-probe | Language Models | verl | 1 | 1 | - |
| llm-rl-importance-sampling | Language Models | verl | 3 | 1 | - |
| llm-scaling-law-discovery | Language Models | scaling-law-lab | 4 | 3 | - |
| llm-sft-loss | Language Models | LLaMA-Factory, lm-evaluation-harness | 4 | 2 | - |
| llm-ttrl-reward | Language Models | ttrl | 3 | 3 | - |
| llm-ttt-adaptation | Language Models | nanoGPT | 3 | 1 | - |
| marl-centralized-critic | Other | epymarl | 3 | 3 | ✓ |
| marl-mixing-network | Other | epymarl | 3 | 3 | - |
| mas-topology | Deep Learning | chatdev-macnet | 3 | 2 | - |
| meta-fewshot-classification | Classical ML | easy-few-shot-learning | 3 | 3 | - |
| meta-inner-loop-optimizer | Classical ML | learn2learn | 3 | 3 | - |
| meta-rl | Classical ML | oyster | 3 | 3 | - |
| meta-rl-algorithm | Classical ML | oyster | 3 | 3 | - |
| ml-active-learning | Classical ML | badge | 5 | 3 | ✓ |
| ml-anomaly-detection | Classical ML | scikit-learn | 5 | 4 | ✓ |
| ml-calibration | Classical ML | scikit-learn | 3 | 4 | ✓ |
| ml-clustering-algorithm | Classical ML | scikit-learn | 3 | 3 | ✓ |
| ml-continual-regularization | Classical ML | continual-learning | 4 | 3 | ✓ |
| ml-dimensionality-reduction | Classical ML | scikit-learn | 5 | 3 | ✓ |
| ml-ensemble-boosting | Classical ML | scikit-learn | 3 | 3 | ✓ |
| ml-feature-selection | Classical ML | scikit-learn | 3 | 3 | ✓ |
| ml-federated-aggregation | Classical ML | flower | 3 | 3 | ✓ |
| ml-missing-data-imputation | Classical ML | scikit-learn | 5 | 3 | ✓ |
| ml-selective-deferral | Classical ML | scikit-learn | 4 | 4 | ✓ |
| ml-subgroup-calibration-shift | Classical ML | scikit-learn | 4 | 3 | ✓ |
| ml-symbolic-regression | Classical ML | gplearn | 3 | 3 | ✓ |
| mlsys-fused-attention | ML Systems | flash-attention | 3 | 3 | - |
| mlsys-moe-load-balance | ML Systems | eplb | 3 | 4 | - |
| mlsys-sparse-attention | ML Systems | SpargeAttn | 3 | 3 | - |
| optimization-bilevel | Optimization | penalized-bilevel-gradient-descent | 4 | 3 | ✓ |
| optimization-convex-concave | Optimization | RAIN | 4 | 3 | ✓ |
| optimization-diagonal-net | Optimization | RAIN | 4 | 4 | - |
| optimization-dp-sgd | Optimization | opacus | 4 | 3 | ✓ |
| optimization-evolution-strategy | Optimization | deap | 4 | 4 | ✓ |
| optimization-gradient-compression | Optimization | pytorch-vision | 3 | 3 | ✓ |
| optimization-hyperparameter-search | Optimization | scikit-learn | 6 | 3 | ✓ |
| optimization-multi-objective | Optimization | deap | 6 | 4 | ✓ |
| optimization-nas | Optimization | naslib | 3 | 3 | ✓ |
| optimization-online-bandit | Optimization | SMPyBandits | 3 | 3 | ✓ |
| optimization-pac-bayes-bound | Optimization | PBB | 3 | 3 | ✓ |
| optimization-parity | Optimization | pytorch-examples | 3 | 3 | ✓ |
| optimization-variance-reduction | Optimization | opt-vr-bench | 6 | 3 | ✓ |
| pde-autoregressive-solver | PDE Solvers | Neural-Solver-Library | 3 | 4 | - |
| pde-design-solver | PDE Solvers | Neural-Solver-Library | 3 | 3 | - |
| pde-steady-solver | PDE Solvers | Neural-Solver-Library | 3 | 3 | - |
| quant-concept-drift | Quantitative Finance | qlib | 3 | 3 | ✓ |
| quant-graph-stock | Quantitative Finance | qlib | 3 | 3 | ✓ |
| quant-stock-prediction | Quantitative Finance | qlib | 3 | 3 | ✓ |
| rl-gcrl-goal-representation | Reinforcement Learning | dual-goal-representations | 3 | 3 | - |
| rl-intrinsic-exploration | Reinforcement Learning | cleanrl | 3 | 3 | ✓ |
| rl-offline-adroit | Reinforcement Learning | CORL | 3 | 3 | ✓ |
| rl-offline-continuous | Reinforcement Learning | CORL | 3 | 3 | ✓ |
| rl-offline-discrete | Reinforcement Learning | d3rlpy | 4 | 3 | ✓ |
| rl-offline-off2on | Reinforcement Learning | CORL | 3 | 3 | - |
| rl-offline-policy | Reinforcement Learning | fql | 3 | 3 | - |
| rl-offline-pomdp | Reinforcement Learning | katakomba | 3 | 3 | ✓ |
| rl-offpolicy-continuous | Reinforcement Learning | cleanrl | 3 | 3 | ✓ |
| rl-offpolicy-sample-efficiency | Reinforcement Learning | FastTD3 | 3 | 3 | - |
| rl-onpolicy-continuous | Reinforcement Learning | cleanrl | 3 | 3 | ✓ |
| rl-reward-learning | Reinforcement Learning | imitation | 3 | 3 | ✓ |
| rl-value-atari | Reinforcement Learning | cleanrl | 3 | 3 | ✓ |
| rl-value-discrete | Reinforcement Learning | cleanrl | 3 | 3 | ✓ |
| robo-diffusion-guidance | Other | cleandiffuser | 3 | 3 | - |
| robo-diffusion-planner | Other | cleandiffuser | 3 | 3 | - |
| robo-diffusion-policy | Other | cleandiffuser | 3 | 3 | - |
| robo-diffusion-sampling-method | Other | cleandiffuser | 3 | 3 | - |
| robo-humanoid-sim2real-algo | Other | humanoid-gym | 3 | 4 | - |
| robo-humanoid-sim2real-reward | Other | humanoid-gym | 4 | 4 | - |
| robomimic-bc-loss | Other | robomimic | 3 | 3 | - |
| robomimic-iql-vf | Other | robomimic | 3 | 3 | - |
| robomimic-obs-encoder | Other | robomimic | 3 | 3 | - |
| safe-exploration-fixed-budget | Reinforcement Learning | omnisafe | 4 | 3 | - |
| safe-rl | Reinforcement Learning | omnisafe | 3 | 3 | ✓ |
| security-adversarial-attack-black-box-score | Adversarial ML | torchattacks | 3 | 5 | ✓ |
| security-adversarial-attack-sparse-l0 | Adversarial ML | torchattacks | 4 | 5 | ✓ |
| security-adversarial-attack-white-box-linf | Adversarial ML | torchattacks | 4 | 5 | ✓ |
| security-adversarial-training | Adversarial ML | torchattacks | 3 | 4 | ✓ |
| security-backdoor-defense | Adversarial ML | pytorch-vision | 4 | 3 | ✓ |
| security-machine-unlearning | Adversarial ML | pytorch-vision | 4 | 3 | ✓ |
| security-membership-inference-defense | Adversarial ML | pytorch-vision | 4 | 3 | ✓ |
| security-poison-robust-learning | Adversarial ML | pytorch-vision | 4 | 3 | ✓ |
| speech-asr-encoder | Speech Processing | speechbrain | 3 | 3 | - |
| speech-enhancement | Speech Processing | speechbrain | 3 | 3 | - |
| speech-vocoder | Speech Processing | speechbrain | 3 | 3 | - |
| stf-traffic-forecast | Time Series | BasicTS | 7 | 3 | - |
| tdmpc2-planning | Other | tdmpc2 | 3 | 3 | - |
| tdmpc2-simnorm | Other | tdmpc2 | 4 | 3 | - |
| ts-anomaly-detection | Time Series | Time-Series-Library | 3 | 3 | ✓ |
| ts-classification | Time Series | Time-Series-Library | 3 | 3 | ✓ |
| ts-exogenous-forecast | Time Series | Time-Series-Library | 4 | 3 | ✓ |
| ts-imputation | Time Series | Time-Series-Library | 3 | 3 | ✓ |
| ts-long-term-forecast | Time Series | Time-Series-Library | 5 | 3 | ✓ |
| ts-short-term-forecast | Time Series | Time-Series-Library | 4 | 3 | ✓ |
| ttt-memory | Deep Learning | titans-lmm | 3 | 3 | - |