Zilence006 commited on
Commit
7bb91c8
·
verified ·
1 Parent(s): 55bdda9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -1
README.md CHANGED
@@ -6,4 +6,35 @@ tags:
6
  - video
7
  - audio
8
  - multimodal
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - video
7
  - audio
8
  - multimodal
9
+ ---
10
+
11
+ # [Vidi: Large Multimodal Models for Video Understanding and Editing](https://arxiv.org/pdf/2504.15681)
12
+
13
+ Homepage: [https://bytedance.github.io/vidi-website/](https://bytedance.github.io/vidi-website/)
14
+
15
+ Github: [https://github.com/bytedance/vidi](https://github.com/bytedance/vidi)
16
+
17
+ Demo: [https://vidi.byteintl.com/](https://vidi.byteintl.com/)
18
+
19
+ > We introduce Vidi, a family of Large Multimodal Models (LMMs) for a wide range of video understanding and editing (VUE) scenarios. The first release focuses on temporal retrieval (TR), i.e., identifying the time ranges in input videos corresponding to a given text query.
20
+
21
+ This model is the first release for temporal retrieval.
22
+
23
+ Please find the inference and evaluation code on [https://github.com/bytedance/vidi](https://github.com/bytedance/vidi).
24
+
25
+ ## Citation
26
+ If you find Vidi useful for your research and applications, please cite using this BibTeX:
27
+ ```
28
+ @article{Vidi2025vidi,
29
+ title={Vidi: Large Multimodal Models for Video
30
+ Understanding and Editing},
31
+ author={Vidi Team, Celong Liu, Chia-Wen Kuo, Dawei Du,
32
+ Fan Chen, Guang Chen, Jiamin Yuan, Lingxi Zhang,
33
+ Lu Guo, Lusha Li, Longyin Wen, Qingyu Chen,
34
+ Rachel Deng, Sijie Zhu, Stuart Siew, Tong Jin,
35
+ Wei Lu, Wen Zhong, Xiaohui Shen, Xin Gu, Xing Mei,
36
+ Xueqiong Qu},
37
+ journal={arXiv preprint arXiv:2504.15681},
38
+ year={2025}
39
+ }
40
+ ```