vHeat: Building Vision Models upon Heat Conduction Paper • 2405.16555 • Published May 26, 2024 • 2
DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution Paper • 2405.16071 • Published May 25, 2024 • 3
ControlCap: Controllable Region-level Captioning Paper • 2401.17910 • Published Jan 31, 2024 • 1
Balancing Understanding and Generation in Discrete Diffusion Models Paper • 2602.01362 • Published 6 days ago • 13
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 5 days ago • 123
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 9 days ago • 147
DocReward: A Document Reward Model for Structuring and Stylizing Paper • 2510.11391 • Published Oct 13, 2025 • 27