Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech

For mispronunciation detection and diagnosis (MDD), nowadays approaches generally treat the phonemes in correct and mispronunciations as the same despite the fact they may actually carry different characteristics. Furthermore, serious data imbalance issue between correct and mispronunciation in dataset further influences the performances. To address these problems, this paper investigates the use of multi-task (MT) learning technique to enhance the acoustic-phonemic model (APM) for MDD. The phonemes in correct and mispronunciations are processed separately but in multi-task manner considering both correct and mispronunciation recognition tasks. A feature representation module is further proposed to improve performance. Compared with baseline APM, the proposed MT-APM, R-MT-APM achieve better performance not only in Precision, Recall and F-Measure, but also in mispronunciation detection and diagnosis accuracies. With feature representation module, R-MT-APM achieves the highest mispronunciation detection accuracy.

[1]  Wolfgang Menzel,et al.  Automatic detection and correction of non-native English pronunciations , 2000 .

[2]  Frank K. Soong,et al.  Capturing L2 segmental mispronunciations with joint-sequence models in Computer-Aided Pronunciation Training (CAPT) , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[3]  Cheung-Chi Leung,et al.  Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Dong Yu,et al.  Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[5]  Rong Tong,et al.  Multi-Task Learning for Mispronunciation Detection on Singapore Children's Mandarin Speech , 2017, INTERSPEECH.

[6]  Helmer Strik,et al.  Automatic pronunciation error detection: an acoustic-phonetic approach , 2004 .

[7]  Jasha Droppo,et al.  Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Wai Kit Lo,et al.  Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training , 2009, SLaTE.

[9]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[10]  Phil D. Green,et al.  Multitask learning in connectionist robust ASR using recurrent neural networks , 2003, INTERSPEECH.

[11]  Frank K. Soong,et al.  Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT) , 2010, INTERSPEECH.

[12]  Kun Li,et al.  Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  S. Seneff,et al.  Spoken Conversational Interaction for Language Learning , 2004 .

[14]  Tatsuya Kawahara,et al.  Automatic pronunciation error detection and guidance for foreign language learning , 1998, ICSLP.

[15]  Frank K. Soong,et al.  Automatic mispronunciation detection for Mandarin , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Frank K. Soong,et al.  Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  James R. Glass,et al.  Context-dependent pronunciation error pattern discovery with limited annotations , 2014, INTERSPEECH.

[18]  Jinsong Zhang,et al.  Pronunciation error detection using DNN articulatory model based on multi-lingual and multi-task learning , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[19]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[20]  Shuang Zhang,et al.  Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system , 2010, INTERSPEECH.

[21]  Alissa M. Harrison,et al.  Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English : The CUHK Experience Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English : The CUHK Experience , 2010 .

[22]  Horacio Franco,et al.  Automatic detection of phone-level mispronunciation for language learning , 1999, EUROSPEECH.

[23]  Helmer Strik,et al.  Comparing different approaches for automatic pronunciation error detection , 2009, Speech Commun..

[24]  Frank K. Soong,et al.  A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.