Very unstable inference

#22

by andypotato - opened Mar 14

Mar 14

The model has a tendency to completely spiral out of control and only produce garbage output. This often happens at sections where speakers repeat a single word multiple times like "yes yes yes". From this point on inference just keeps repeating this single word.

Another issue is that it will not stop after the file has already been fully transcribed and keeps repeating nonsense characters and end tokens.

This behavior can be reproduced on the gradio demo and also single file demos from the Github repo.

RobertLiu0905

Apr 16

You can refer to this PR

RobertLiu0905

Apr 16

https://github.com/microsoft/VibeVoice/pull/228

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment