Motus: A Unified Latent Action World Model
Paper
•
2512.13030
•
Published
Stage 1 pretrained WAN 2.2 (5B) video generation model for Motus. This checkpoint provides the video generation backbone trained on Multi-Robot Task Trajectory, Synthetic Robot Data, and Egocentric Human Videos.
Homepage | GitHub | arXiv | Feishu | WeChat
| Component | Specification |
|---|---|
| Base Model | WAN 2.2 |
| Parameters | 5B |
| Precision | bfloat16 |
| Mode | VRAM | Example GPU |
|---|---|---|
| Inference | ~ 16 GB | RTX 4090 |
| Fine-Tuning | ~ 40 GB | A100 (40GB) |
Update your Motus config file (e.g., configs/robotwin.yaml):
model:
wan:
checkpoint_path: "./pretrained_models/Motus_Wan2_2_5B_pretrain" # This checkpoint
config_path: "./pretrained_models/Motus_Wan2_2_5B_pretrain"
vae_path: "./pretrained_models/Wan2.2-TI2V-5B/Wan2.2_VAE.pth" # Local VAE (not included)
precision: "bfloat16"
# Using Hugging Face CLI
huggingface-cli download motus-robotics/Motus_Wan2_2_5B_pretrain --local-dir ./pretrained_models/Motus_Wan2_2_5B_pretrain
# Or using Git LFS
git lfs install
git clone https://huggingface.co/motus-robotics/Motus_Wan2_2_5B_pretrain
The WAN VAE (Wan2.2_VAE.pth) is not included in this repository. You need to:
vae_path in your config to point to the local VAE file@misc{bi2025motusunifiedlatentaction,
title={Motus: A Unified Latent Action World Model},
author={Hongzhe Bi and Hengkai Tan and Shenghao Xie and Zeyuan Wang and Shuhe Huang and Haitian Liu and Ruowen Zhao and Yao Feng and Chendong Xiang and Yinze Rong and Hongyan Zhao and Hanyu Liu and Zhizhong Su and Lei Ma and Hang Su and Jun Zhu},
year={2025},
eprint={2512.13030},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.13030},
}