robomimic-obs-encoder
Otherrobomimicrigorous codebase
Description
Behavioral Cloning: Observation Encoder Design for Robot State Fusion
Research Question
Design an improved observation encoder that fuses multiple robot state modalities for behavioral cloning. In robot manipulation, observations consist of heterogeneous components (end-effector pose, gripper state, object state) that may benefit from non-trivial fusion strategies beyond simple concatenation.
What You Can Modify
The CustomObsEncoder class (lines 19-47) in custom_obs_encoder.py. This module receives a dictionary of observation tensors and must return a fused feature vector.
Interface:
- Input:
obs_dict-- dictionary with keys:robot0_eef_pos: [B, 3] end-effector positionrobot0_eef_quat: [B, 4] end-effector quaternion orientationrobot0_gripper_qpos: [B, 2] gripper joint positionsobject: [B, D_obj] object state
- Output: [B, output_dim] fused feature vector
- Required attribute:
self.output_dim(int) -- dimensionality of the output
You may add parameters to __init__, define helper methods, and add learnable layers.
Evaluation
- Metric:
success_rate-- rollout success rate in the environment (higher is better) - Tasks: Lift, Can, Square (robot manipulation with proficient human demonstrations)
- Dataset: ~200 proficient human demonstrations per task, low-dimensional observations
- Policy: GMM (Gaussian Mixture Model) with 5 mixture components, trained with NLL loss. A 2-layer MLP backbone (1024, 1024) with ReLU feeds into GMM heads (means, log-stds, mixture logits) on top of encoder output
- Training: 2000 epochs, Adam optimizer (lr=1e-4), batch size 100
- Rollout: 50 episodes per task, horizon 400 steps, every 50 epochs
Code
custom_obs_encoder.py
EditableRead-only
1"""2Custom Observation Encoder for multi-modal robot state fusion.34This module defines the observation encoder used by BC-GMM training5in robomimic. The encoder receives a dictionary of observation tensors6(end-effector pose, gripper state, object state) and returns a fused7feature vector that is fed into the MLP backbone and GMM heads.89The custom encoder is imported and used by the patched BC_GMM network.10"""1112import torch13import torch.nn as nn14import torch.nn.functional as F15
Results
| Model | Type | success rate tool hang ph ↑ | success rate can ph ↑ | success rate square ph ↑ |
|---|---|---|---|---|
| attention_fusion | baseline | 0.113 | 0.833 | 0.680 |
| default | baseline | 0.147 | 0.853 | 0.733 |
| gated_fusion | baseline | 0.120 | 0.960 | 0.640 |