Automatic detection of mispronounced phonemes for language learning tools

Automatic Speech Recognition (ASR) can be very useful in language learning tools in order to correct mistakes in the pronunciation of foreign words by non-native speakers. Most of the systems integrating ASR proposed on the market are just rejecting or accepting whole words or whole sentences. In this paper, we propose a method to identify the pronunciation errors at the phoneme level. Indeed, mistakes are often predictable and concern a particular subset of phonemes not present in the mother language of the speaker. We describe two different approaches based on the Hybrid HMM/ANN technology. The methodology for the training of the recognizer is discussed, and we describe a new approach where a mixed database is used to train a speech recognition system able to detect pronunciation errors at the phoneme level. Preliminary but promising results have been obtained on the DEMOSTHENES database.

[1]  Yoon Kim,et al.  Automatic pronunciation scoring for language instruction , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Horacio Franco,et al.  Automatic detection of mispronunciation for language instruction , 1997, EUROSPEECH.

[3]  Yoon Kim,et al.  Automatic pronunciation scoring of specific phone segments for language instruction , 1997, EUROSPEECH.

[4]  Hervé Bourlard,et al.  Task independent and dependent training: performance comparison of HMM and hybrid HMM/MLP approaches , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[6]  Olivier Deroo,et al.  Automatic detection and correction of pronunciation errors for foreign language learners: the demosthenes application , 1999, EUROSPEECH.

[7]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[8]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .