Robo-Diffusion: Guided Sampling Strategy Design

Objective

Design better guided sampling strategies for diffusion models in robot decision-making. Your goal is to create guidance methods that can effectively steer the diffusion model to generate action sequences that satisfy specific conditions (e.g., achieving high rewards, reaching target states).

Background

Guided sampling is crucial for conditional generation in diffusion models. Two main approaches exist:

Classifier-free Guidance (CFG): Trains the model with and without conditions, then interpolates between conditional and unconditional predictions
Classifier Guidance (CG): Uses a separate classifier to guide the diffusion process via gradients

The choice of guidance strategy and its parameters significantly impacts the quality and diversity of generated actions.

Task Description

You will design a custom guidance strategy for a diffusion-based robot policy. The guidance function takes:

Current noisy state
Timestep
Condition information (observations, desired rewards)

And outputs:

Guidance signal to adjust the denoising process

What You Can Modify

You can modify the guidance strategy in the following ways:

Classifier-free guidance weight (w_cfg)
Classifier guidance weight (w_cg)
Custom guidance functions
Guidance application timestep ranges (e.g., only guide early/late steps)
Hybrid guidance strategies (combining CFG and CG)
Adaptive guidance (changing weights based on timestep or condition)
Custom classifier architectures (if using classifier guidance)

What Is Fixed

Network architecture: Chi_UNet1d (standard configuration)
Diffusion model: DDPM with cosine noise schedule
Training hyperparameters
Evaluation environments

Evaluation

Your guidance strategy will be evaluated on three D4RL MuJoCo environments:

hopper-medium-v2: Hopper robot locomotion
walker2d-medium-v2: Walker2d robot locomotion
halfcheetah-medium-v2: HalfCheetah robot locomotion

Metrics:

Normalized Score: D4RL normalized score (0-100, higher is better)
Sample Diversity: Diversity of generated actions (measured by std)
Inference Time: Average inference time per action (lower is better)

Baselines

Three baseline guidance strategies are provided:

1. No Guidance

No guidance applied during sampling
Pure unconditional generation
Fastest inference, but may not follow conditions well

2. Classifier-Free Guidance (CFG)

Standard CFG with w_cfg = 1.0
Interpolates between conditional and unconditional predictions
Good balance between condition following and diversity
Paper: "Classifier-Free Diffusion Guidance"

3. Classifier Guidance (CG)

Uses a trained classifier to guide sampling
w_cg = 1.0 with cumulative reward classifier
Strong condition following, but may reduce diversity
Paper: "Diffusion Models Beat GANs on Image Synthesis"

Tips

Consider the trade-off between condition following and diversity
Think about when guidance is most effective (early vs. late timesteps)
Adaptive guidance can be more effective than fixed weights
Combining CFG and CG can leverage benefits of both

File to Edit

cleandiffuser/pipelines/custom_guidance.py

You will implement your custom guidance strategy in the apply_guidance function.

robo-diffusion-guidance

Description