HyLaR: Hybrid Latent Reasoning with Decoupled Policy Optimization

We introduce HyLar, a training framework that enables multimodal large language models (MLLMs) to perform hybrid latent reasoning — combining textual chain-of-thought with continuous visual latent representations. HyLar introduces a Canvas-in-Latents mechanism during supervised fine-tuning and a Decoupled Hybrid PPO algorithm during reinforcement learning, allowing the model to seamlessly interleave discrete text reasoning and continuous latent visual thinking.

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support