Low latency and long context SoTA Text to Text Translation
The Krutrim translate model translates the input text into one of the chosen Indic languages. To build Krutrim translate, we increased the context length of the popular IndicTrans2 translation model, extending it from 256 to 4096. For training, we leveraged the Bharat Parallel Corpus Collection (BPCC) while also augmenting it with our own data to enhance performance.
Furthermore, to improve latency, we explored various architectures for both training and distillation. We are open-sourcing the distilled version with 6 encoder and 3 decoder layers, supporting translation in both directions: English to Indic and Indic to English. This architecture achieves at least a 4x reduction in latency compared to both the original IndicTrans2 and the distilled IndicTrans2 models, with minimal decline in performance.
The following is the list of languages supported by our model: English, Bengali, Hindi, Kannada, Marathi, Malayalam, Gujarati, Telugu, and Tamil.
And many more...
As we benchmarked our model against IndicTrans2, we evaluated its performance using the IN22-gen and IN22-conv datasets. The IndicTrans2 results were sourced from their research paper. Below, we present a comparison of the CHRF++ scores achieved by both models.
Language | IN22-gen | IN22-conv | ||
---|---|---|---|---|
Eng → Ind | Ind → Eng | Eng → Ind | Ind → Eng | |
Bengali | 51.8 | 50.0 | 63.2 | 60.8 |
Hindi | 56.7 | 54.4 | 65.4 | 62.0 |
Kannada | 51.0 | 47.9 | 64.2 | 58.4 |
Marathi | 51.0 | 48.9 | 63.7 | 60.7 |
Malayalam | 50.9 | 49.3 | 64.5 | 57.8 |
Gujarati | 53.5 | 51.4 | 66.5 | 57.7 |
Punjabi | 50.6 | 50.2 | 63.4 | 58.1 |
Telugu | 52.4 | 50.0 | 64.8 | 60.4 |
Tamil | 49.5 | 48.3 | 59.8 | 56.9 |
Average | 51.9 | 50.0 | 63.9 | 59.2 |
This code repository and the model weights are licensed under the Krutrim Community License.