robo-diffusion-guidance

Othercleandiffuserrigorous codebase

Description

Robo-Diffusion: Guided Sampling Strategy Design

Objective

Design better guided sampling strategies for diffusion models in robot decision-making. Your goal is to create guidance methods that can effectively steer the diffusion model to generate action sequences that satisfy specific conditions (e.g., achieving high rewards, reaching target states).

Background

Guided sampling is crucial for conditional generation in diffusion models. Two main approaches exist:

  1. Classifier-free Guidance (CFG): Trains the model with and without conditions, then interpolates between conditional and unconditional predictions
  2. Classifier Guidance (CG): Uses a separate classifier to guide the diffusion process via gradients

The choice of guidance strategy and its parameters significantly impacts the quality and diversity of generated actions.

Task Description

You will design a custom guidance strategy for a diffusion-based robot policy. The guidance function takes:

  • Current noisy state
  • Timestep
  • Condition information (observations, desired rewards)

And outputs:

  • Guidance signal to adjust the denoising process

What You Can Modify

You can modify the guidance strategy in the following ways:

  • Classifier-free guidance weight (w_cfg)
  • Classifier guidance weight (w_cg)
  • Custom guidance functions
  • Guidance application timestep ranges (e.g., only guide early/late steps)
  • Hybrid guidance strategies (combining CFG and CG)
  • Adaptive guidance (changing weights based on timestep or condition)
  • Custom classifier architectures (if using classifier guidance)

What Is Fixed

  • Network architecture: Chi_UNet1d (standard configuration)
  • Diffusion model: DDPM with cosine noise schedule
  • Training hyperparameters
  • Evaluation environments

Evaluation

Your guidance strategy will be evaluated on three D4RL MuJoCo environments:

  1. hopper-medium-v2: Hopper robot locomotion
  2. walker2d-medium-v2: Walker2d robot locomotion
  3. halfcheetah-medium-v2: HalfCheetah robot locomotion

Metrics:

  • Normalized Score: D4RL normalized score (0-100, higher is better)
  • Sample Diversity: Diversity of generated actions (measured by std)
  • Inference Time: Average inference time per action (lower is better)

Baselines

Three baseline guidance strategies are provided:

1. No Guidance

  • No guidance applied during sampling
  • Pure unconditional generation
  • Fastest inference, but may not follow conditions well

2. Classifier-Free Guidance (CFG)

  • Standard CFG with w_cfg = 1.0
  • Interpolates between conditional and unconditional predictions
  • Good balance between condition following and diversity
  • Paper: "Classifier-Free Diffusion Guidance"

3. Classifier Guidance (CG)

  • Uses a trained classifier to guide sampling
  • w_cg = 1.0 with cumulative reward classifier
  • Strong condition following, but may reduce diversity
  • Paper: "Diffusion Models Beat GANs on Image Synthesis"

Tips

  • Consider the trade-off between condition following and diversity
  • Think about when guidance is most effective (early vs. late timesteps)
  • Adaptive guidance can be more effective than fixed weights
  • Combining CFG and CG can leverage benefits of both

File to Edit

cleandiffuser/pipelines/custom_guidance.py

You will implement your custom guidance strategy in the apply_guidance function.

Code

Results

No results available yet.