iamluokai (luokai)

upvoted a paper 3 months ago

WonderZoom: Multi-Scale 3D World Generation

Paper • 2512.09164 • Published Dec 9, 2025 • 12

upvoted a paper 4 months ago

BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration

Paper • 2510.00438 • Published Oct 1, 2025 • 10

upvoted 2 collections 7 months ago

MobileCLIP2

Collection

MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B • 27 items • Updated 13 days ago • 58

FastVLM

Collection

Efficient Vision Encoding for Vision Language Models • 8 items • Updated 13 days ago • 109

upvoted 3 papers 7 months ago

upvoted a collection 8 months ago

Seed-X

Collection

A powerful open-source multilingual translation language model series, including instruction and reasoning models. • 8 items • Updated Aug 22, 2025 • 67

upvoted a paper 8 months ago

RoboBrain 2.0 Technical Report

Paper • 2507.02029 • Published Jul 2, 2025 • 35

upvoted a paper 9 months ago

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

Paper • 2506.21416 • Published Jun 26, 2025 • 28

upvoted a collection 9 months ago

ERNIE 4.5

Collection

collection of ERNIE 4.5 models. • 27 items • Updated Nov 11, 2025 • 183

upvoted an article 9 months ago

Article

🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation

Jun 21, 2025

•

77

upvoted a collection 9 months ago

MedGemma Release

Collection

Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. • 9 items • Updated 3 days ago • 452

upvoted a collection 10 months ago

Qwen2.5-Omni

Collection

End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 6 items • Updated 13 days ago • 164

upvoted 2 collections 11 months ago

Qwen3

Collection

84 items • Updated Dec 31, 2025 • 1.72k

InternVL3

Collection

33 items • Updated 13 days ago • 84

upvoted 2 papers 12 months ago

SkyReels-A2: Compose Anything in Video Diffusion Transformers

Paper • 2504.02436 • Published Apr 3, 2025 • 39

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published Mar 14, 2025 • 148

upvoted 2 collections about 1 year ago

Wan2.1 14B 480p I2V LoRAs

Collection

A collection of Remade's Wan2.1 14B 480p I2V LoRAs • 49 items • Updated May 24, 2025 • 209

olmOCR

Collection

olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 12 items • Updated Dec 23, 2025 • 150

luokai

AI & ML interests

Organizations

WonderZoom: Multi-Scale 3D World Generation

BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration

MobileCLIP2

FastVLM

ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion

Seed-X

RoboBrain 2.0 Technical Report

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

ERNIE 4.5

🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation

MedGemma Release

Qwen2.5-Omni

Qwen3

InternVL3

SkyReels-A2: Compose Anything in Video Diffusion Transformers

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Wan2.1 14B 480p I2V LoRAs

olmOCR

luokai

AI & ML interests

Organizations

iamluokai's activity

🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation