Exploring the automatic mispronunciation detection of confusable phones for mandarin

Mispronunciation detection is one of the vital tasks of the CALL (Computer Assisted Language Learning) systems. Many methods have been introduced to accomplish this task. However, few of them have addressed the detection task on confusable phones. In this paper, phone-level classifiers are utilized to improve the detection performance on the confusable phones. Features of the classifiers are posterior probability vectors calculated from their corresponding acoustic models. Moreover, confusion matrix is also extracted and incorporated to calculate derivatives of the posterior probability vectors. Experiments on our Mandarin database validate the effectiveness of our proposed method, compared with the commonly used posterior probability and phone dependent thresholds methods.

[1]  Jean Paul Haton,et al.  Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration and Graphemic Constraints , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Silke M. Witt,et al.  Use of speech recognition in computer-assisted language learning , 2000 .

[6]  Horacio Franco,et al.  Automatic detection of phone-level mispronunciation for language learning , 1999, EUROSPEECH.

[7]  Frank K. Soong,et al.  Automatic mispronunciation detection for Mandarin , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Frank K. Soong,et al.  Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Akinori Ito,et al.  Pronunciation error detection method based on error rule clustering using a decision tree , 2005, INTERSPEECH.