π«’ NextInnoMind / next_bemba_ai
Bemba Whisper ASR (Automatic Speech Recognition) Fine-tuned Whisper model for the Bemba language only. Developed and maintained by NextInnoMind, led by Chalwe Silas.
π§ͺ Model Type
WhisperForConditionalGeneration β fine-tuned using openai/whisper-small
Framework: Transformers
Checkpoint Format: Safetensors
Languages: Bemba
π Model Description
This model is a Whisper Small variant fine-tuned exclusively for Bemba, a major Zambian language. It is designed to enhance local language ASR performance and promote indigenous language technology.
π Training Details
Base Model:
openai/whisper-smallDataset:
- BembaSpeech (curated dataset of Bemba audio + transcripts)
Training Time: 8 epochs (~45 hours on A100 GPU)
Learning Rate: 1e-5
Batch Size: 16
Framework: Transformers + Accelerate
Tokenizer: WhisperProcessor with
task="transcribe"(no language token used)
π Usage
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="NextInnoMind/next_bemba_ai",
chunk_length_s=30,
return_timestamps=True
)
# Example
result = pipe("path_to_audio.wav")
print(result["text"])
π Tip: No language token is required. The model is fine-tuned for Bemba only.
π Applications
- Education: Local language transcriptions and learning tools
- Broadcast & Media: Transcribe Bemba radio and TV shows
- Research: Bantu language documentation and analysis
- Accessibility: Voice-to-text systems in local apps and platforms
β οΈ Limitations & Biases
- Trained only on Bemba: does not support English or other languages.
- Accuracy may drop with heavy background noise or strong dialectal variation.
- Not optimized for code-switching or informal speech styles.
π Evaluation
| Language | WER (Word Error Rate) | Dataset |
|---|---|---|
| Bemba | ~16.7% | BembaSpeech Eval Set |
π± Environmental Impact
- Hardware: A100 40GB x1
- Training Time: ~45 hours
- Carbon Emissions: Estimated ~20.4 kg COβ (via ML CO2 Impact)
π Citation
@misc{nextbembaai2025,
title={NextInnoMind next_bemba_ai: Whisper-based ASR model for Bemba},
author={Silas Chalwe and NextInnoMind},
year={2025},
howpublished={\url{https://huggingface.co/NextInnoMind/next_bemba_ai}},
}
π§βπ» Maintainers
- Chalwe Silas (Lead Developer & Dataset Curator)
- Team NextInnoMind
π¬ Contact:
π GitHub: SilasChalwe
π Related Resources
Fine tuned in Zambia.
- Downloads last month
- 12