Fixed-Budget Diffusion Sampler Updates

Studies how latent diffusion sampling updates improve text-image alignment under a fixed inference-step budget.

Vision & GenerationCFGpp-main
cv-diffusion-efficiency

Description

Diffusion Model: Sampler Efficiency Optimization

Objective

Design a sampling algorithm for text-to-image diffusion models that achieves high generation quality with a fixed budget of NFE = 20 denoiser evaluations.

Background

Diffusion models generate images by iteratively denoising from random noise. Different samplers differ in how they update the latent after each model prediction. The general structure of one step is:

for step, t in enumerate(timesteps):
    # 1. Predict noise.
    noise_pred = model(zt, t, text_embedding)
    # 2. Estimate clean image (Tweedie's formula).
    z0t = (zt - sigma_t * noise_pred) / alpha_t
    # 3. Update to next step (this differs across samplers).
    zt_next = update_rule(zt, z0t, noise_pred, t, t_next)

Reference families:

  • DDIM (Song et al., ICLR 2021, arXiv:2010.02502) — first-order ODE solver, deterministic, simple update rule.
  • DPM-Solver++ (Lu et al., 2022, arXiv:2211.01095) — high-order solvers for the diffusion ODE in data-prediction form.
    • DPM-Solver++(2M) — second-order multistep variant, reuses the previous denoiser output.
    • DPM-Solver++(2S) — second-order singlestep variant, smaller high-order error constant.
    • DPM-Solver++(3M) SDE — third-order multistep stochastic variant for guided sampling.

A useful method may use time-dependent coefficients, history (multistep), predictor-corrector structure, or guidance-aware renoising — but it must respect the fixed function-evaluation budget.

Implementation Contract

Implement the update rule for both Stable Diffusion v1.5 and SDXL by editing the marked editable regions of two files:

  1. latent_diffusion.pyBaseDDIMCFGpp class for SD v1.5 (sample() method). Available helpers: self.get_text_embed(), self.initialize_latent(), self.predict_noise(), self.alpha(t).
  2. latent_sdxl.pyBaseDDIMCFGpp class for SDXL (reverse_process() method). Available helpers: self.initialize_latent(size=...), self.predict_noise(), self.scheduler.alphas_cumprod[t].

The contribution must respect a fixed budget of NFE = 20 denoiser calls per sample.

Baselines

BaselineDescription
ddimDDIM (Song et al., ICLR 2021, arXiv:2010.02502). First-order deterministic.
dpm3m_sdeDPM-Solver++(3M) SDE multistep variant (Lu et al., 2022, arXiv:2211.01095).
dpm2sDPM-Solver++(2S) second-order singlestep variant (same paper).

Fixed Pipeline

  • Models: Stable Diffusion v1.5 and SDXL (frozen weights).
  • Prompt set: shared evaluation prompts across all baselines.
  • NFE budget: 20 denoiser calls per sample.

Evaluation

Evaluation runs text-to-image sampling on the model variants above. Metrics reported:

  • CLIP score (cosine similarity between generated image and text prompt; higher is better).
  • FID computed against a reference image set (lower is better).

Task scoring uses per-variant FID (lower is better). The method should improve image quality across variants without changing prompts, model weights, allowed function-evaluation budget, or metric computation.

Code

latent_diffusion.py
EditableRead-only
1"""
2This module includes LDM-based inverse problem solvers.
3Forward operators follow DPS and DDRM/DDNM.
4"""
5
6from typing import Any, Callable, Dict, Optional
7
8import torch
9from diffusers import DDIMScheduler, StableDiffusionPipeline
10from tqdm import tqdm
11
12####### Factory #######
13__SOLVER__ = {}
14
15def register_solver(name: str):
latent_sdxl.py
EditableRead-only
1from typing import Any, Optional, Tuple
2import os
3from safetensors.torch import load_file
4
5import torch
6from diffusers import AutoencoderKL, DDIMScheduler, StableDiffusionXLPipeline, UNet2DConditionModel, EulerDiscreteScheduler
7from diffusers.models.attention_processor import (AttnProcessor2_0,
8 LoRAAttnProcessor2_0,
9 LoRAXFormersAttnProcessor,
10 XFormersAttnProcessor)
11from tqdm import tqdm
12from latent_diffusion import get_sigmas_karras, get_ancestral_step, append_zero
13
14####### Factory #######
15__SOLVER__ = {}

Method Summary

Auto-summarized from each method's code by an LLM reviewer — not the model's original output. Browse via the picker below; the Code section is independent.
Baselines
Agents
Claude Opus 4.6·Formulahigh

DPM++3M-SDE with cosine eta anneal

DPM-Solver++(3M) SDE on Karras sigmas, but eta cosine-anneals from 1.5 down to 0.5 across the 20 steps to balance early diversity and late detail.

ηi=ηmin+12(ηmaxηmin) ⁣(1+cos ⁣(πiN1)),hη=h(η+1)\eta_i = \eta_{\min} + \tfrac{1}{2}(\eta_{\max}-\eta_{\min})\!\left(1 + \cos\!\left(\pi\,\tfrac{i}{N-1}\right)\right),\quad h_\eta = h(\eta+1)
Δ vs. baselineSame DPM++3M SDE multistep update as the dpm3m_sde baseline, but replaces the constant eta=1.2 with a cosine schedule running from 1.5 to 0.5 over the steps, applied to both SD and SDXL.
eta_max=1.5eta_min=0.5rho=7.0order=3Recovers DPM++3M SDE (eta=1.2) at the trajectory midpoint

Results