robo-diffusion-guidance
Description
Robo-Diffusion: Guided Sampling Strategy Design
Objective
Design better guided sampling strategies for diffusion models in robot decision-making. Your goal is to create guidance methods that can effectively steer the diffusion model to generate action sequences that satisfy specific conditions (e.g., achieving high rewards, reaching target states).
Background
Guided sampling is crucial for conditional generation in diffusion models. Two main approaches exist:
- Classifier-free Guidance (CFG): Trains the model with and without conditions, then interpolates between conditional and unconditional predictions
- Classifier Guidance (CG): Uses a separate classifier to guide the diffusion process via gradients
The choice of guidance strategy and its parameters significantly impacts the quality and diversity of generated actions.
Task Description
You will design a custom guidance strategy for a diffusion-based robot policy. The guidance function takes:
- Current noisy state
- Timestep
- Condition information (observations, desired rewards)
And outputs:
- Guidance signal to adjust the denoising process
What You Can Modify
You can modify the guidance strategy in the following ways:
- Classifier-free guidance weight (
w_cfg) - Classifier guidance weight (
w_cg) - Custom guidance functions
- Guidance application timestep ranges (e.g., only guide early/late steps)
- Hybrid guidance strategies (combining CFG and CG)
- Adaptive guidance (changing weights based on timestep or condition)
- Custom classifier architectures (if using classifier guidance)
What Is Fixed
- Network architecture: Chi_UNet1d (standard configuration)
- Diffusion model: DDPM with cosine noise schedule
- Training hyperparameters
- Evaluation environments
Evaluation
Your guidance strategy will be evaluated on three D4RL MuJoCo environments:
- hopper-medium-v2: Hopper robot locomotion
- walker2d-medium-v2: Walker2d robot locomotion
- halfcheetah-medium-v2: HalfCheetah robot locomotion
Metrics:
- Normalized Score: D4RL normalized score (0-100, higher is better)
- Sample Diversity: Diversity of generated actions (measured by std)
- Inference Time: Average inference time per action (lower is better)
Baselines
Three baseline guidance strategies are provided:
1. No Guidance
- No guidance applied during sampling
- Pure unconditional generation
- Fastest inference, but may not follow conditions well
2. Classifier-Free Guidance (CFG)
- Standard CFG with
w_cfg = 1.0 - Interpolates between conditional and unconditional predictions
- Good balance between condition following and diversity
- Paper: "Classifier-Free Diffusion Guidance"
3. Classifier Guidance (CG)
- Uses a trained classifier to guide sampling
w_cg = 1.0with cumulative reward classifier- Strong condition following, but may reduce diversity
- Paper: "Diffusion Models Beat GANs on Image Synthesis"
Tips
- Consider the trade-off between condition following and diversity
- Think about when guidance is most effective (early vs. late timesteps)
- Adaptive guidance can be more effective than fixed weights
- Combining CFG and CG can leverage benefits of both
File to Edit
cleandiffuser/pipelines/custom_guidance.py
You will implement your custom guidance strategy in the apply_guidance function.
Code
Results
No results available yet.