longvideotool
/

LongVT-RL

@@ -12,10 +12,10 @@ pipeline_tag: video-text-to-text
 <div align="center">
-[![Data](https://img.shields.io/badge/Data-0040A1?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://huggingface.co/collections/lmms-lab/openmmreasoner)
 [![Paper](https://img.shields.io/badge/Paper-000000?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2511.16334)
-[![Project Page](https://img.shields.io/badge/Website-000000?style=for-the-badge&logo=google-chrome&logoColor=white)](https://evolvinglmms-lab.github.io/OpenMMReasoner/)
-[![Github](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/EvolvingLMMs-Lab/OpenMMReasoner)
 </div>
 ## Overview
@@ -36,63 +36,9 @@ With a meticulously designed three-stage training strategy and extensive empiric
 The model is the RL version of the LongVT and was trained on https://huggingface.co/datasets/longvideotool/LongVT-Parquet.
-## Basic Usage
-We present a very basic inference usage here for our model. Our model can be used just as Qwen2.5-VL-7B-Instruct and using vllm. For more detail about using and evaluation of our model, please visit [GitHub](https://github.com/EvolvingLMMs-Lab/LongVT) for more information.
-```python
-from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
-from qwen_vl_utils import process_vision_info
-SYSTEM_PROMPT = (
-    "You are a helpful assistant. When the user asks a question, your response must include two parts: "
-    "first, the reasoning process enclosed in <think>...</think> tags, then the final answer enclosed in <answer>...</answer> tags."
-    "Please provide a clear, concise response within <answer> </answer> tags that directly addresses the question."
-)
-model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
-    "OpenMMReasoner/OpenMMReasoner-RL", torch_dtype="auto", device_map="auto"
-)
-processor = AutoProcessor.from_pretrained("OpenMMReasoner/OpenMMReasoner-RL")
-messages = [
-    {
-        "role": "system",
-        "content": [
-            {"type": "text", "text": SYSTEM_PROMPT},
-        ],
-    },
-    {
-        "role": "user",
-        "content": [
-            {
-                "type": "image",
-                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
-            },
-            {"type": "text", "text": "Describe this image."},
-        ],
-    }
-]
-# Preparation for inference
-text = processor.apply_chat_template(
-    messages, tokenize=False, add_generation_prompt=True
-)
-image_inputs, video_inputs = process_vision_info(messages)
-inputs = processor(
-    text=[text],
-    images=image_inputs,
-    videos=video_inputs,
-    padding=True,
-    return_tensors="pt",
-)
-inputs = inputs.to("cuda")
-# Inference: Generation of the output
-generated_ids = model.generate(**inputs, max_new_tokens=128)
-generated_ids_trimmed = [
-    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
-]
-output_text = processor.batch_decode(
-    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
-)
-print(output_text)
-```
 ## Evaluation Results

 <div align="center">
+[![Data](https://img.shields.io/badge/Data-0040A1?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://huggingface.co/collections/lmms-lab/longvt)
 [![Paper](https://img.shields.io/badge/Paper-000000?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2511.16334)
+[![Project Page](https://img.shields.io/badge/Website-000000?style=for-the-badge&logo=google-chrome&logoColor=white)](https://evolvinglmms-lab.github.io/LongVT/)
+[![Github](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/EvolvingLMMs-Lab/LongVT)
 </div>
 ## Overview
 The model is the RL version of the LongVT and was trained on https://huggingface.co/datasets/longvideotool/LongVT-Parquet.
+## Usage & Evaluation
+For detailed instructions on inference and evaluation, please refer to our [GitHub repository](https://github.com/EvolvingLMMs-Lab/LongVT). We recommend using the scripts and environment provided there to reproduce our results.
 ## Evaluation Results