VAMOS: A Hierarchical Vision-Language-Action Model for Capab
Collection
This collection contains VLM planner checkpoints, affordance module checkpoints for Spot and HOUND, training datasets, and a demo • 7 items • Updated
• 3
This model is a merged LoRA fine-tuned version of google/paligemma2-3b-pt-224 on the mateoguaman/vamos_navigation_only_dataset dataset. It has been trained using TRL.
Note that this model was NOT trained with language annotations, so it is not a steerable model. It can only do point-based navigation without preferences.
Coming Soon
This model is a fine-tuned derivative of google/paligemma2-3b-pt-224,
subject to the Gemma Terms of Use.
The training data includes content under CC BY-NC 4.0, so this model and its outputs are provided for non-commercial use only.
Please see the accompanying LICENSE and NOTICE files for full details.
Base model
google/paligemma2-3b-pt-224