Skywork
/

R1V4

OrlandoHugBot commited on 10 days ago

Commit

5c02ef0

verified ·

1 Parent(s): e1a59d2

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,3 +1,40 @@
----
-license: apache-2.0
----

+# 📌 Overview
+**Skywork-R1V4** is a 30B (A3B) multimodal agent that unifies:
+- Multimodal task planning
+- Active image manipulation (“thinking with images”)
+- Deep multimodal search (text × image)
+- Interleaved tool-grounded reasoning
+Unlike traditional VLMs that treat visual operations and search as disjoint capabilities—or agent systems that rely heavily on costly RL—Skywork-R1V4 is trained **purely via supervised finetuning** on **< 30k high-quality, execution-consistent trajectories**.
+At inference time, the model exhibits **emergent long-horizon reasoning**, executing **10+ tool calls** across visual operations and web search to solve complex real-world tasks.
+Skywork-R1V4 achieves **state-of-the-art performance** on multimodal search benchmarks:
+- **MMSearch: 66.1**
+- **FVQA: 67.2**
+- **Beats Gemini 2.5 Flash on all 11 comparable metrics**
+# 🚀 Key Features
+### **“Thinking With Images”**
+Skywork-R1V4 actively manipulates images:
+- Multi-stage cropping
+- Local detail extraction
+- Region attention
+- Visual clue refinement
+### **Interleaved Reasoning**
+The model alternates between:
+1. Visual reasoning
+2. Image operation
+3. Web search
+4. Cross-evidence verification
+---
+# 🔗 Links
+- **Model Center**: https://platform.skyworkmodel.ai/#/model-center
+- **API Documentation (R1V4)**: https://docs.skyworkmodel.ai/r1v4/api-reference/completions.html