OpenGVLab

community

https://github.com/opengvlab

opengvlab

OpenGVLab

Activity Feed Request to join this org

AI & ML interests

Computer Vision

Recent Activity

Rayment authored a paper 5 days ago

MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites

Rayment authored a paper 5 days ago

ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework

linghan199 authored a paper 5 days ago

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

View all activity

Papers

RIVER: A Real-Time Interaction Benchmark for Video LLMs

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

View all Papers

Kaining

authored a paper 2 days ago

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

Paper • 2603.25730 • Published 6 days ago • 45

prithivMLmods

posted an update 5 days ago

Post

4473

Flux-Klein-KV-Edit-Consistency demo is now available on Spaces. It preserves character identity and delivers high-quality, realistic results after edits. No need for any special prompts, just upload the image, type your prompt, and get the resulting image blazing fast.

🔥 Demo Space: prithivMLmods/flux-klein-kv-edit-consistency
🤗 Model: black-forest-labs/FLUX.2-klein-9b-kv
🤗 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
🔗 Gradio Server Mode: https://www.gradio.app/main/guides/server-mode

➔ Built with Headless Gradio, an alternative to using gr.Blocks for creating the frontend and triggering events, powered by FastAPI + Gradio. You can now design the frontend however you want, with continued support for APIs, MCP, and ZeroGPU.

➔ Gradio Server Mode is now available from gradio@v6.10.0.

To learn more, visit the app page or the respective model pages.

Rayment

authored 2 papers 5 days ago

MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites

Paper • 2510.12126 • Published Oct 14, 2025 • 2

ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework

Paper • 2603.20644 • Published 12 days ago • 3

linghan199

authored a paper 5 days ago

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published 7 days ago • 124

Rayment

authored a paper 5 days ago

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published 7 days ago • 124

ganlinyang

authored a paper 8 days ago

ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework

Paper • 2603.20644 • Published 12 days ago • 3

prithivMLmods

posted an update 12 days ago

Post

4396

Map-Anything v1 (Universal Feed-Forward Metric 3D Reconstruction) demo is now available on Hugging Face Spaces. Built with Gradio and integrated with Rerun, it performs multi-image and video-based 3D reconstruction, depth, normal map, and interactive measurements.

🤗 Demo: prithivMLmods/Map-Anything-v1
🤗 Model: facebook/map-anything-v1
🤗 Hf-Papers: MapAnything: Universal Feed-Forward Metric 3D Reconstruction (2509.13414)

prithivMLmods

posted an update 15 days ago

Post

3056

Introducing QIE-Bbox-Studio! 🔥🤗

The QIE-Bbox-Studio demo is now live — more precise and packed with more options. Users can manipulate images with object removal, design addition, and even move objects from one place to another, all in just 4-step fast inference.

🤗 Demo: prithivMLmods/QIE-Bbox-Studio
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/QIE-Bbox-Studio

🚀 Models [LoRA] :

● QIE-2511-Object-Mover-Bbox: prithivMLmods/QIE-2511-Object-Mover-Bbox
● QIE-2511-Object-Remover-Bbox-v3: prithivMLmods/QIE-2511-Object-Remover-Bbox-v3
● QIE-2511-Outfit-Design-Layout: prithivMLmods/QIE-2511-Outfit-Design-Layout
● QIE-2509-Object-Remover-Bbox-v3: prithivMLmods/QIE-2509-Object-Remover-Bbox-v3
● QIE-2509-Object-Mover-Bbox: prithivMLmods/QIE-2509-Object-Mover-Bbox

🚀 Collection:

● Qwen Image Edit [Layout Bbox]: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-layout-bbox

To learn more, visit the app page or the respective model pages.

Nymbo

posted an update 17 days ago

Post

6354

We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.

3 replies

prithivMLmods

posted an update 18 days ago

Post

5044

QIE-2509-Object-Remover-Bbox-v3 is a more stable version of the Qwen Image Edit visual grounding–based object removal model. The app was previously featured in HF Spaces of the Week and is now updated with the latest Bbox-v3 LoRA adapter.

🤗 Demo: prithivMLmods/QIE-Object-Remover-Bbox
🤗 LoRA: prithivMLmods/QIE-2509-Object-Remover-Bbox-v3
🤗 Collection: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-layout-bbox

To learn more, visit the app page or the respective model pages.

2 replies

Changyao

authored 2 papers 19 days ago

GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing

Paper • 2603.12264 • Published 20 days ago • 14

Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation

Paper • 2603.12247 • Published 20 days ago • 23

wqshao126

authored a paper 20 days ago

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

Paper • 2603.08561 • Published 23 days ago • 12

Xrenya

in OpenGVLab/InternVideo2-Stage2_1B-224p-f4 21 days ago

Error when using model

#2 opened about 2 months ago by

wardaslab

Rayment

authored a paper 21 days ago

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Paper • 2603.09877 • Published 22 days ago • 47

Changyao

authored a paper 21 days ago

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Paper • 2603.09877 • Published 22 days ago • 47

ganlinyang

authored a paper 21 days ago

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Paper • 2603.09877 • Published 22 days ago • 47

prithivMLmods

posted an update 26 days ago

Post

5023

The Qwen3.5 Multimodal Understanding Demo, powered by Qwen3.5-2B, is now available on HF Spaces! It is a lightweight model designed for fast image and video reasoning. Built with Gradio, the demo showcases Image QA, Video QA, object detection, and 2D point tracking, along with real-time token streaming.

🤗 Demo: prithivMLmods/Qwen-3.5-HF-Demo
✅ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
🔗 Qwen3.5-2B: Qwen/Qwen3.5-2B

To learn more, visit the app page or the respective model pages.

cg1177

submitted a paper to Daily Papers 27 days ago

Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

Paper • 2603.05484 • Published 27 days ago • 4

AI & ML interests

Recent Activity

Papers

Team members 118

OpenGVLab's activity

Error when using model