MRI
MRI India Journals Vol. 15 No. 1 (2026)

AI Automatic Pronunciation Mistake Detector

Authors

  • Pratham Singh Dept Of Computer Engineering, Shree L.R Tiwari College Of Engineering
  • Rishikesh Singh Dept Of Computer Engineering, Shree L.R Tiwari College Of Engineering
  • Siddhi Sankhe Dept Of Computer Engineering, Shree L.R Tiwari College Of Engineering
  • Narendra Prajapati Dept Of Computer Engineering, Shree L.R Tiwari College Of Engineering
  • Manasi Churi Dept Of Computer Engineering, Shree L.R Tiwari College Of Engineering

DOI:

https://doi.org/10.65521/ijacte.v15i1.2626

Keywords:

Automated Pronunciation Evaluation Phoneme-Level Assessment Multilingual ASR Grapheme-to-Phoneme Conversion Dynamic Time Warping

Abstract

Automated Pronunciation Evaluation plays a major role in computer-assisted learning for various languages, majorly used for learning English, and many other languages. However, effective multilingual systems for pronunciation assessment are not yet fully developed, particularly for Indic languages which have complex character and phonetic systems. Most pronunciation assessment systems utilize word-level scoring or limited acoustic models, which restrict the scope for phoneme-level assessment and accommodating linguistic diversity. Additionally, errors resulting from ASR systems affect the overall accuracy of the scoring process. This paper proposes a framework for phoneme-level pronunciation assessment system for English, Hindi, and Marathi languages. The system is developed by integrating the Whisper ASR model, word-level timestamp extraction, grapheme-to-phoneme conversion, Dynamic Time Warping for robust word alignment, and phoneme-level Levenshtein distance scoring. In addition, the schwa deletion module is included to handle Devanagari languages. The schwa deletion module is designed to eliminate the impact of schwa characters on pronunciation scores.

The framework is based on a modular three-tier structure that includes a browser-based audio capture interface, a Flask-based REST API backend, and an extensible AI processing core developed based on interface-driven model abstractions. Normalization and resampling of audio signals are performed before ASR inference to improve consistency across recording conditions, while DTW-based word alignment over a word distance matrix enhances robustness against ASR variability. The experimental results show stable word alignment against recognition noise and consistent accuracy discrimination for phoneme-level qualities across varying word pronunciation qualities. Word-level categorization and IPA visualization are provided for actionable feedback on pronunciation qualities for multilingual learning scenarios.

 

Downloads

Published

2026-05-03

How to Cite

Singh, P., Singh, R., Sankhe, S., Prajapati, N., & Churi, M. (2026). AI Automatic Pronunciation Mistake Detector. International Journal on Advanced Computer Theory and Engineering, 15(1), 110–124. https://doi.org/10.65521/ijacte.v15i1.2626

Issue

Section

Articles

Most read articles by the same author(s)

Similar Articles

<< < 3 4 5 6 7 8 9 10 11 12 > >> 

You may also start an advanced similarity search for this article.