Sudong Wang commited on
Commit
b62f558
·
verified ·
1 Parent(s): baed025

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -60
README.md CHANGED
@@ -12,10 +12,10 @@ pipeline_tag: video-text-to-text
12
 
13
  <div align="center">
14
 
15
- [![Data](https://img.shields.io/badge/Data-0040A1?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://huggingface.co/collections/lmms-lab/openmmreasoner)
16
  [![Paper](https://img.shields.io/badge/Paper-000000?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2511.16334)
17
- [![Project Page](https://img.shields.io/badge/Website-000000?style=for-the-badge&logo=google-chrome&logoColor=white)](https://evolvinglmms-lab.github.io/OpenMMReasoner/)
18
- [![Github](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/EvolvingLMMs-Lab/OpenMMReasoner)
19
  </div>
20
 
21
  ## Overview
@@ -36,63 +36,9 @@ With a meticulously designed three-stage training strategy and extensive empiric
36
 
37
  The model is the RL version of the LongVT and was trained on https://huggingface.co/datasets/longvideotool/LongVT-Parquet.
38
 
39
- ## Basic Usage
40
-
41
- We present a very basic inference usage here for our model. Our model can be used just as Qwen2.5-VL-7B-Instruct and using vllm. For more detail about using and evaluation of our model, please visit [GitHub](https://github.com/EvolvingLMMs-Lab/LongVT) for more information.
42
-
43
- ```python
44
- from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
45
- from qwen_vl_utils import process_vision_info
46
- SYSTEM_PROMPT = (
47
- "You are a helpful assistant. When the user asks a question, your response must include two parts: "
48
- "first, the reasoning process enclosed in <think>...</think> tags, then the final answer enclosed in <answer>...</answer> tags."
49
- "Please provide a clear, concise response within <answer> </answer> tags that directly addresses the question."
50
- )
51
- model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
52
- "OpenMMReasoner/OpenMMReasoner-RL", torch_dtype="auto", device_map="auto"
53
- )
54
- processor = AutoProcessor.from_pretrained("OpenMMReasoner/OpenMMReasoner-RL")
55
- messages = [
56
- {
57
- "role": "system",
58
- "content": [
59
- {"type": "text", "text": SYSTEM_PROMPT},
60
- ],
61
- },
62
- {
63
- "role": "user",
64
- "content": [
65
- {
66
- "type": "image",
67
- "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
68
- },
69
- {"type": "text", "text": "Describe this image."},
70
- ],
71
- }
72
- ]
73
- # Preparation for inference
74
- text = processor.apply_chat_template(
75
- messages, tokenize=False, add_generation_prompt=True
76
- )
77
- image_inputs, video_inputs = process_vision_info(messages)
78
- inputs = processor(
79
- text=[text],
80
- images=image_inputs,
81
- videos=video_inputs,
82
- padding=True,
83
- return_tensors="pt",
84
- )
85
- inputs = inputs.to("cuda")
86
- # Inference: Generation of the output
87
- generated_ids = model.generate(**inputs, max_new_tokens=128)
88
- generated_ids_trimmed = [
89
- out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
90
- ]
91
- output_text = processor.batch_decode(
92
- generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
93
- )
94
- print(output_text)
95
- ```
96
 
97
  ## Evaluation Results
98
 
 
12
 
13
  <div align="center">
14
 
15
+ [![Data](https://img.shields.io/badge/Data-0040A1?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://huggingface.co/collections/lmms-lab/longvt)
16
  [![Paper](https://img.shields.io/badge/Paper-000000?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2511.16334)
17
+ [![Project Page](https://img.shields.io/badge/Website-000000?style=for-the-badge&logo=google-chrome&logoColor=white)](https://evolvinglmms-lab.github.io/LongVT/)
18
+ [![Github](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/EvolvingLMMs-Lab/LongVT)
19
  </div>
20
 
21
  ## Overview
 
36
 
37
  The model is the RL version of the LongVT and was trained on https://huggingface.co/datasets/longvideotool/LongVT-Parquet.
38
 
39
+ ## Usage & Evaluation
40
+
41
+ For detailed instructions on inference and evaluation, please refer to our [GitHub repository](https://github.com/EvolvingLMMs-Lab/LongVT). We recommend using the scripts and environment provided there to reproduce our results.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Evaluation Results
44