Automatic Speech Recognition
Transformers
Safetensors
VibeVoice
ASR
Transcriptoin
Diarization
Speech-to-Text
Instructions to use microsoft/VibeVoice-ASR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/VibeVoice-ASR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="microsoft/VibeVoice-ASR")# Load model directly from transformers import VibeVoiceForASRTraining model = VibeVoiceForASRTraining.from_pretrained("microsoft/VibeVoice-ASR", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Very unstable inference
#22
by andypotato - opened
The model has a tendency to completely spiral out of control and only produce garbage output. This often happens at sections where speakers repeat a single word multiple times like "yes yes yes". From this point on inference just keeps repeating this single word.
Another issue is that it will not stop after the file has already been fully transcribed and keeps repeating nonsense characters and end tokens.
This behavior can be reproduced on the gradio demo and also single file demos from the Github repo.
You can refer to this PR