BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration Paper • 2510.00438 • Published Oct 1, 2025 • 10
MobileCLIP2 Collection MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B • 27 items • Updated 13 days ago • 58
FastVLM Collection Efficient Vision Encoding for Vision Language Models • 8 items • Updated 13 days ago • 109
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing Paper • 2508.10881 • Published Aug 14, 2025 • 52
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2, 2025 • 238
EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion Paper • 2507.16535 • Published Jul 22, 2025 • 23
Seed-X Collection A powerful open-source multilingual translation language model series, including instruction and reasoning models. • 8 items • Updated Aug 22, 2025 • 67
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation Paper • 2506.21416 • Published Jun 26, 2025 • 28
MedGemma Release Collection Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. • 9 items • Updated 3 days ago • 452
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 6 items • Updated 13 days ago • 164
SkyReels-A2: Compose Anything in Video Diffusion Transformers Paper • 2504.02436 • Published Apr 3, 2025 • 39
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Paper • 2503.11647 • Published Mar 14, 2025 • 148
Wan2.1 14B 480p I2V LoRAs Collection A collection of Remade's Wan2.1 14B 480p I2V LoRAs • 49 items • Updated May 24, 2025 • 209
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 12 items • Updated Dec 23, 2025 • 150