AI & ML interests

None defined yet.

Recent Activity

Update README.md

#2 opened 25 days ago by
hypothetical
hypothetical 
posted an update 25 days ago
view post
Post
2590
We thought it would be easier, but finally we have integrated CuDNN Paged Attention to our models!


Read article here: https://app.thestage.ai/blog/Integrating-cuDNN-Paged-Attention-to-TheStage-AI-Inference?id=8

Llama-8B with CuDNN paged attention, including B200 support: TheStageAI/Elastic-Llama-3.1-8B-Instruct
Mistral-Small-24B with CuDNN paged attention, including B200 support: TheStageAI/Elastic-Mistral-Small-3.1-24B-Instruct-2503
hypothetical 
posted an update about 1 month ago
view post
Post
2032
We have updated our transcription model: TheStageAI/thewhisper-large-v3-turbo

– 6.00 WER on the English Open ASR Leaderboard
– 4.74 WER on the Multilingual Open ASR Leaderboard
– Beats NVIDIA Parakeet (6.34 WER) and Whisper-large-v3-turbo (7.8 WER)
– Strong improvements in Arabic, Hindi, Chinese
– Maintains quality with background and environmental noise
– Optimized inference engines for NVIDIA and Apple
– Hugging Face Transformers interface for easy use
– Best-in-class speed on NVIDIA GPUs and power efficiency on Apple devices
– NVIDIA Jetson Thor support
  • 2 replies
·

update-checkpoint-v2

#2 opened about 1 month ago by
quazim
hypothetical 
posted an update 2 months ago
view post
Post
266
Hello guys! Maybe someone want to test our framework for automated model's compression. Here is what can be produced with it. Move the slider - compress/accelerate model, select point which like and compile. I can give an access, we are now improving and collecting comments from users

TheStageAI/ANNA-LLM
  • 3 replies
·