🫢 NextInnoMind / next_bemba_ai

Bemba Whisper ASR (Automatic Speech Recognition) Fine-tuned Whisper model for the Bemba language only. Developed and maintained by NextInnoMind, led by Chalwe Silas.

🧪 Model Type

WhisperForConditionalGeneration — fine-tuned using openai/whisper-small Framework: Transformers Checkpoint Format: Safetensors Languages: Bemba

📜 Model Description

This model is a Whisper Small variant fine-tuned exclusively for Bemba, a major Zambian language. It is designed to enhance local language ASR performance and promote indigenous language technology.

📚 Training Details

Base Model: openai/whisper-small
Dataset:
- BembaSpeech (curated dataset of Bemba audio + transcripts)
Training Time: 8 epochs (~45 hours on A100 GPU)
Learning Rate: 1e-5
Batch Size: 16
Framework: Transformers + Accelerate
Tokenizer: WhisperProcessor with task="transcribe" (no language token used)

🚀 Usage

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="NextInnoMind/next_bemba_ai",
    chunk_length_s=30,
    return_timestamps=True
)

# Example
result = pipe("path_to_audio.wav")
print(result["text"])

📌 Tip: No language token is required. The model is fine-tuned for Bemba only.

🔍 Applications

Education: Local language transcriptions and learning tools
Broadcast & Media: Transcribe Bemba radio and TV shows
Research: Bantu language documentation and analysis
Accessibility: Voice-to-text systems in local apps and platforms

⚠️ Limitations & Biases

Trained only on Bemba: does not support English or other languages.
Accuracy may drop with heavy background noise or strong dialectal variation.
Not optimized for code-switching or informal speech styles.

📊 Evaluation

Language	WER (Word Error Rate)	Dataset
Bemba	~16.7%	BembaSpeech Eval Set

🌱 Environmental Impact

Hardware: A100 40GB x1
Training Time: ~45 hours
Carbon Emissions: Estimated ~20.4 kg CO₂ (via ML CO2 Impact)

📄 Citation

@misc{nextbembaai2025,
  title={NextInnoMind next_bemba_ai: Whisper-based ASR model for Bemba},
  author={Silas Chalwe and NextInnoMind},
  year={2025},
  howpublished={\url{https://huggingface.co/NextInnoMind/next_bemba_ai}},
}

🧑‍💻 Maintainers

Chalwe Silas (Lead Developer & Dataset Curator)
Team NextInnoMind

📬 Contact:

🔗 GitHub: SilasChalwe

📌 Related Resources

Fine tuned in Zambia.

Downloads last month: 12

Safetensors

Model size

0.2B params

Tensor type

F32