Skywork
/

R1V4

Image-Text-to-Text

Transformers

Model card Files Files and versions

xet

Community

OrlandoHugBot commited on 11 days ago

Commit

e92c73c

verified ·

1 Parent(s): 5c02ef0

Update README.md

Browse files

Files changed (1) hide show

README.md +30 -15

README.md CHANGED Viewed

@@ -1,4 +1,16 @@
-# 📌 Overview
 **Skywork-R1V4** is a 30B (A3B) multimodal agent that unifies:
 - Multimodal task planning
@@ -6,7 +18,7 @@
 - Deep multimodal search (text × image)
 - Interleaved tool-grounded reasoning
-Unlike traditional VLMs that treat visual operations and search as disjoint capabilities—or agent systems that rely heavily on costly RL—Skywork-R1V4 is trained **purely via supervised finetuning** on **< 30k high-quality, execution-consistent trajectories**.
 At inference time, the model exhibits **emergent long-horizon reasoning**, executing **10+ tool calls** across visual operations and web search to solve complex real-world tasks.
@@ -16,25 +28,28 @@ Skywork-R1V4 achieves **state-of-the-art performance** on multimodal search benc
 - **Beats Gemini 2.5 Flash on all 11 comparable metrics**
-# 🚀 Key Features
-### **“Thinking With Images”**
-Skywork-R1V4 actively manipulates images:
-- Multi-stage cropping
-- Local detail extraction
-- Region attention
-- Visual clue refinement
-### **Interleaved Reasoning**
 The model alternates between:
-1. Visual reasoning
-2. Image operation
-3. Web search
-4. Cross-evidence verification
 ---
-# 🔗 Links
 - **Model Center**: https://platform.skyworkmodel.ai/#/model-center
 - **API Documentation (R1V4)**: https://docs.skyworkmodel.ai/r1v4/api-reference/completions.html

+---
+pipeline_tag: image-text-to-text
+library_name: transformers
+license: mit
+---
+# Skywork-R1V4
+<div align="center">
+  <img src="skywork-logo.png" alt="Introduction Image" width="500" height="400">
+</div>
+## 1. Model Introduction
 **Skywork-R1V4** is a 30B (A3B) multimodal agent that unifies:
 - Multimodal task planning
 - Deep multimodal search (text × image)
 - Interleaved tool-grounded reasoning
+Skywork-R1V4 is trained **purely via supervised finetuning** on **< 30k high-quality, execution-consistent trajectories**.
 At inference time, the model exhibits **emergent long-horizon reasoning**, executing **10+ tool calls** across visual operations and web search to solve complex real-world tasks.
 - **Beats Gemini 2.5 Flash on all 11 comparable metrics**
+## 2. Feature
+### 🔍 **“Thinking With Images”**
+Skywork-R1V4 actively manipulates images through:
+&nbsp;&nbsp;&nbsp;&nbsp;• Multi-stage cropping
+&nbsp;&nbsp;&nbsp;&nbsp;• Local detail extraction
+&nbsp;&nbsp;&nbsp;&nbsp;• Region attention
+&nbsp;&nbsp;&nbsp;&nbsp;• Visual clue refinement
+### 🔄 **Interleaved Reasoning**
 The model alternates between:
+&nbsp;&nbsp;&nbsp;&nbsp;1. Visual reasoning
+&nbsp;&nbsp;&nbsp;&nbsp;2. Image operation
+&nbsp;&nbsp;&nbsp;&nbsp;3. Web search
+&nbsp;&nbsp;&nbsp;&nbsp;4. Cross-evidence verification
 ---
+## 3. Links
 - **Model Center**: https://platform.skyworkmodel.ai/#/model-center
 - **API Documentation (R1V4)**: https://docs.skyworkmodel.ai/r1v4/api-reference/completions.html