| base_model: | |
| - Qwen/Qwen2.5-VL-3B-Instruct | |
| license: mit | |
| pipeline_tag: video-text-to-text | |
| library_name: transformers | |
| This repository contains the model described in [Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence](https://huggingface.co/papers/2505.23747). | |
| Project page: https://diankun-wu.github.io/Spatial-MLLM/ | |
| Code: https://github.com/diankun-wu/Spatial-MLLM |