Tasks

Browse all benchmark tasks. Click a task to see its description, leaderboard, and agent conversation logs.

170 tasks
Task Category PackagesBaselines Envs Logs
agent-tool-reasoningAgent Reasoningstabletoolbench91-
ai4bio-antibody-binding-scoringAI for BiologyAbBiBench33-
ai4bio-antibody-cdr-designAI for Biologychimera-bench33-
ai4bio-mutation-effect-predictionAI for BiologyProteinGym33-
ai4bio-protein-function-predictionAI for BiologyDeepProtein33-
ai4bio-protein-inverse-foldingAI for BiologyProteinInvBench33-
ai4bio-protein-structure-reprAI for BiologyProteinWorkshop33-
ai4sci-climate-emulationAI for ScienceClimSim53-
ai4sci-inverse-diffusion-algoAI for ScienceInverseBench33-
ai4sci-mol-property-predictionAI for ScienceUni-Mol33-
ai4sci-pla-binding-affinityAI for ScienceEHIGN_PLA43-
ai4sci-vs-contrastive-scoringAI for ScienceHypSeek34-
ai4sci-weather-forecast-aggregationAI for ScienceClimaX43-
ar-video-kv-temporal-policyAutonomous RoboticsFAR43-
causal-discovery-discreteCausal Inferencecausal-bnlearn55
causal-observational-linear-gaussianCausal Inferencecausal-learn45
causal-observational-linear-non-gaussianCausal Inferencecausal-learn33
causal-observational-nonlinearCausal Inferencecausal-learn43
causal-treatment-effectCausal Inferencescikit-learn63
cv-3dgs-densificationComputer Visiongsplat44
cv-classification-lossComputer Visionpytorch-vision33
cv-data-augmentationComputer Visionpytorch-vision33
cv-dbm-samplerComputer Visiondbim-codebase32
cv-dbm-schedulerComputer Visiondbim-codebase42
cv-diffusion-architectureComputer Visiondiffusers-main33
cv-diffusion-cfgComputer VisionCFGpp-main33
cv-diffusion-conditioningComputer Visiondiffusers-main33
cv-diffusion-efficiencyComputer VisionCFGpp-main33
cv-diffusion-predictionComputer Visiondiffusers-main33
cv-flowmaps-trainingComputer Visionflow-maps33-
cv-meanflow-perceptual-lossComputer Visionalphaflow-main33
cv-meanflow-trainingComputer Visionalphaflow-main33-
cv-multitask-lossComputer Visionpytorch-vision33
cv-pooling-aggregationComputer Visionpytorch-vision33
cv-sample-weightingComputer Visionpytorch-vision33
cv-vae-lossComputer Visiondiffusers-main33
dex-retargetOtherdex-retargeting33-
dl-activation-functionDeep Learningpytorch-vision33
dl-lr-scheduleDeep Learningpytorch-vision33
dl-normalizationDeep Learningpytorch-vision33
dl-regularizationDeep Learningpytorch-vision33
dl-residual-connectionDeep Learningpytorch-vision33
dl-weight-initializationDeep Learningpytorch-vision33
dlm-dkv-policyDeep LearningdLLM-cache64-
graph-generationGraph Learningpytorch-geometric53-
graph-graph-classificationGraph Learningpytorch-geometric63-
graph-link-predictionGraph Learningpytorch-geometric-lp63-
graph-node-classificationGraph Learningpytorch-geometric63-
graph-signal-propagationGraph LearningChebNetII74-
graph-temporalGraph LearningBasicTS63-
humanoid-ppo-extractorOtherhumanoid-bench63-
jepa-planningDeep Learningeb_jepa33-
jepa-prediction-lossDeep Learningeb_jepa33-
jepa-regularizerDeep Learningeb_jepa33-
libero-lifelongReinforcement LearningLIBERO31-
llm-algorithm-16MqatLanguage Modelsllm-16m-qat-runtime33-
llm-dllm-demask-strategyLanguage ModelsLLaDA33
llm-hybrid-posttrainingLanguage Modelsverl41-
llm-kv-adaptive-quantizationML Systemstransformers-kv-lab53-
llm-kv-selection-budgetingML SystemsFastKV54-
llm-kv-structural-reductionML SystemsnanoGPT45-
llm-offline-rlLanguage ModelsLLaMA-Factory, MathRuler, alpaca_eval23
llm-pretrain-attentionLanguage Modelslm-evaluation-harness, nanoGPT32
llm-pretrain-bitlinearLanguage Modelslm-evaluation-harness, nanoGPT32
llm-pretrain-embeddingLanguage Modelslm-evaluation-harness, nanoGPT32
llm-pretrain-kernelLanguage Modelslm-evaluation-harness, nanoGPT32
llm-pretrain-linear-attentionLanguage Modelslm-evaluation-harness, nanoGPT32
llm-pretrain-lossLanguage Modelslm-evaluation-harness, nanoGPT32
llm-pretrain-lr-scheduleLanguage Modelslm-evaluation-harness, nanoGPT32
llm-pretrain-mlpLanguage Modelslm-evaluation-harness, nanoGPT32
llm-pretrain-normalizationLanguage Modelslm-evaluation-harness, nanoGPT32
llm-pretrain-optimizerLanguage Modelslm-evaluation-harness, nanoGPT32
llm-pretrain-residualLanguage Modelslm-evaluation-harness, nanoGPT42
llm-ptq-algorithmLanguage Modelsgptq33-
llm-qat-algorithmLanguage Modelsgptq33-
llm-rl-advantageLanguage Modelsverl31-
llm-rl-advantage-1.5b-probeLanguage Modelsverl11-
llm-rl-importance-samplingLanguage Modelsverl31-
llm-scaling-law-discoveryLanguage Modelsscaling-law-lab43-
llm-sft-lossLanguage ModelsLLaMA-Factory, lm-evaluation-harness42-
llm-ttrl-rewardLanguage Modelsttrl33-
llm-ttt-adaptationLanguage ModelsnanoGPT31-
marl-centralized-criticOtherepymarl33
marl-mixing-networkOtherepymarl33-
mas-topologyDeep Learningchatdev-macnet32-
meta-fewshot-classificationClassical MLeasy-few-shot-learning33-
meta-inner-loop-optimizerClassical MLlearn2learn33-
meta-rlClassical MLoyster33-
meta-rl-algorithmClassical MLoyster33-
ml-active-learningClassical MLbadge53
ml-anomaly-detectionClassical MLscikit-learn54
ml-calibrationClassical MLscikit-learn34
ml-clustering-algorithmClassical MLscikit-learn33
ml-continual-regularizationClassical MLcontinual-learning43
ml-dimensionality-reductionClassical MLscikit-learn53
ml-ensemble-boostingClassical MLscikit-learn33
ml-feature-selectionClassical MLscikit-learn33
ml-federated-aggregationClassical MLflower33
ml-missing-data-imputationClassical MLscikit-learn53
ml-selective-deferralClassical MLscikit-learn44
ml-subgroup-calibration-shiftClassical MLscikit-learn43
ml-symbolic-regressionClassical MLgplearn33
mlsys-fused-attentionML Systemsflash-attention33-
mlsys-moe-load-balanceML Systemseplb34-
mlsys-sparse-attentionML SystemsSpargeAttn33-
optimization-bilevelOptimizationpenalized-bilevel-gradient-descent43
optimization-convex-concaveOptimizationRAIN43
optimization-diagonal-netOptimizationRAIN44-
optimization-dp-sgdOptimizationopacus43
optimization-evolution-strategyOptimizationdeap44
optimization-gradient-compressionOptimizationpytorch-vision33
optimization-hyperparameter-searchOptimizationscikit-learn63
optimization-multi-objectiveOptimizationdeap64
optimization-nasOptimizationnaslib33
optimization-online-banditOptimizationSMPyBandits33
optimization-pac-bayes-boundOptimizationPBB33
optimization-parityOptimizationpytorch-examples33
optimization-variance-reductionOptimizationopt-vr-bench63
pde-autoregressive-solverPDE SolversNeural-Solver-Library34-
pde-design-solverPDE SolversNeural-Solver-Library33-
pde-steady-solverPDE SolversNeural-Solver-Library33-
quant-concept-driftQuantitative Financeqlib33
quant-graph-stockQuantitative Financeqlib33
quant-stock-predictionQuantitative Financeqlib33
rl-gcrl-goal-representationReinforcement Learningdual-goal-representations33-
rl-intrinsic-explorationReinforcement Learningcleanrl33
rl-offline-adroitReinforcement LearningCORL33
rl-offline-continuousReinforcement LearningCORL33
rl-offline-discreteReinforcement Learningd3rlpy43
rl-offline-off2onReinforcement LearningCORL33-
rl-offline-policyReinforcement Learningfql33-
rl-offline-pomdpReinforcement Learningkatakomba33
rl-offpolicy-continuousReinforcement Learningcleanrl33
rl-offpolicy-sample-efficiencyReinforcement LearningFastTD333-
rl-onpolicy-continuousReinforcement Learningcleanrl33
rl-reward-learningReinforcement Learningimitation33
rl-value-atariReinforcement Learningcleanrl33
rl-value-discreteReinforcement Learningcleanrl33
robo-diffusion-guidanceOthercleandiffuser33-
robo-diffusion-plannerOthercleandiffuser33-
robo-diffusion-policyOthercleandiffuser33-
robo-diffusion-sampling-methodOthercleandiffuser33-
robo-humanoid-sim2real-algoOtherhumanoid-gym34-
robo-humanoid-sim2real-rewardOtherhumanoid-gym44-
robomimic-bc-lossOtherrobomimic33-
robomimic-iql-vfOtherrobomimic33-
robomimic-obs-encoderOtherrobomimic33-
safe-exploration-fixed-budgetReinforcement Learningomnisafe43-
safe-rlReinforcement Learningomnisafe33
security-adversarial-attack-black-box-scoreAdversarial MLtorchattacks35
security-adversarial-attack-sparse-l0Adversarial MLtorchattacks45
security-adversarial-attack-white-box-linfAdversarial MLtorchattacks45
security-adversarial-trainingAdversarial MLtorchattacks34
security-backdoor-defenseAdversarial MLpytorch-vision43
security-machine-unlearningAdversarial MLpytorch-vision43
security-membership-inference-defenseAdversarial MLpytorch-vision43
security-poison-robust-learningAdversarial MLpytorch-vision43
speech-asr-encoderSpeech Processingspeechbrain33-
speech-enhancementSpeech Processingspeechbrain33-
speech-vocoderSpeech Processingspeechbrain33-
stf-traffic-forecastTime SeriesBasicTS73-
tdmpc2-planningOthertdmpc233-
tdmpc2-simnormOthertdmpc243-
ts-anomaly-detectionTime SeriesTime-Series-Library33
ts-classificationTime SeriesTime-Series-Library33
ts-exogenous-forecastTime SeriesTime-Series-Library43
ts-imputationTime SeriesTime-Series-Library33
ts-long-term-forecastTime SeriesTime-Series-Library53
ts-short-term-forecastTime SeriesTime-Series-Library43
ttt-memoryDeep Learningtitans-lmm33-