Dhwani, India's first end-to-end trained speech LLM, is powered by our Krutrim-1 LLM. This enables our LLM to directly understand speech without a separate speech-to-text (ASR) model, thereby avoiding any ASR errors cascading into LLM. As part of this release, we are open-sourcing the speech-to-text translation capabilities of our Dhwani model. It supports translation between Indic Languages and English. The supported Indic languages are English, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Tamil, and Telugu.
Metric | en_hin | en_guj | en_mar | en_ben | en_tam | en_tel | en_mal | en_kan |
---|---|---|---|---|---|---|---|---|
Avg | 57.7 | 44.3 | 43.3 | 49.0 | 47.0 | 40.8 | 39.0 | 47.0 |
Metric | hin_en | guj_en | mar_en | ben_en | tam_en | tel_en | mal_en | kan_en |
---|---|---|---|---|---|---|---|---|
Avg | 35.7 | 34.6 | 33.2 | 19.2 | 25.4 | 17.4 | 38.9 | 28.0 |
This code repository and the model weights are licensed under Krutrim Community License.
"IndicST Indian Multilingual Translation Corpus For Evaluating Speech Large Language Models", Sanket Shah, Kavya Ranjan Saxena, Kancharana Manideep Bharadwaj, Sharath Adavanne, Nagaraj Adiga. ICASSP 2025.