论文信息 - Effective articulatory modeling for pronunciation error detection of L2 learner without non-native training data

Effective articulatory modeling for pronunciation error detection of L2 learner without non-native training data

For effective articulatory feedback in computer-assisted pronunciation training (CAPT) systems, we address effective articulatory models of second language (L2) learners' speech without using such data, which is difficult to collect and annotate in a large scale. Context-dependent articulatory attributes (placement and manner of articulation) are modeled based on deep neural network (DNN). In order to efficiently train the non-native articulatory models, we exploit large speech corpora of native and target language to model inter-language phenomena. This multi-lingual learning is then combined with multi-task learning, which uses phone-classification as a sub-task. These methods are applied to Mandarin Chinese pronunciation learning by Japanese native speakers. Effects are confirmed in the native attribute classification and pronunciation error detection of non-native speech.

Jinsong Zhang | Tatsuya Kawahara | Richeng Duan | Masatake Dantsuji

[1] Joost van Doremalen,et al. The DISCO ASR-based CALL system: practicing L2 oral skills and beyond , 2012, LREC.

[2] Helmer Strik,et al. The Pedagogy-Technology Interface in Computer Assisted Pronunciation Training , 2002 .

[3] Preeti Rao,et al. Vowel mispronunciation detection using DNN acoustic models with cross-lingual training , 2015, INTERSPEECH.

[4] Yuen Yee Lo,et al. Deriving salient learners’ mispronunciations from cross-language phonological comparisons , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5] Lou Boves,et al. Assessment of dutch pronunciation by means of automatic speech recognition technology , 1998, ICSLP.

[6] Gérard Bailly,et al. Can you 'read' tongue movements? Evaluation of the contribution of tongue display to speech understanding , 2007, Speech Commun..

[7] Yin Song,et al. Experimental study of discriminative adaptive training and MLLR for automatic pronunciation evaluation , 2011 .

[8] Helmer Strik,et al. Automatic pronunciation error detection: an acoustic-phonetic approach , 2004 .

[9] Lucia Specia,et al. Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation , 2013, ACL.

[10] Ramya Rasipuram,et al. Improving Articulatory Feature and Phoneme Recognition Using Multitask Learning , 2011, ICANN.

[11] Yonghong Yan,et al. A novel discriminative method for pronunciation quality assessment , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12] Tatsuya Kawahara,et al. Recognition and verification of English by Japanese students for computer-assisted language learning system , 2002, INTERSPEECH.

[13] Arumugam Rathinavelu,et al. Three Dimensional Articulator Model for Speech Acquisition by Children with Hearing Loss , 2007, HCI.

[14] Steve J. Young,et al. Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[15] Frank K. Soong,et al. A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16] Tatsuya Kawahara,et al. Automatic pronunciation error detection and guidance for foreign language learning , 1998, ICSLP.

[17] Helmer Strik,et al. Comparing classifiers for pronunciation error detection , 2007, INTERSPEECH.

[18] Lin-Shan Lee,et al. Improved approaches of modeling and detecting Error Patterns with empirical analysis for Computer-Aided Pronunciation Training , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Lin-Shan Lee,et al. Toward unsupervised discovery of pronunciation error patterns using universal phoneme posteriorgram for computer-assisted language learning , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20] James R. Glass,et al. Mispronunciation detection without nonnative training data , 2015, INTERSPEECH.

[21] S. Seneff,et al. Spoken Conversational Interaction for Language Learning , 2004 .

[22] Wang Yunjia. How Japanese learners of Chinese process the aspirated and unaspirated consonants in standard Chinese , 2004 .

[23] Sascha Fagel,et al. A 3-d virtual head as a tool for speech therapy for children , 2008, INTERSPEECH.

[24] Hossein Farhady,et al. Evaluation of the Usefulness of the Versant for English Test: A Response , 2008 .

[25] Horacio Franco,et al. Automatic detection of phone-level mispronunciation for language learning , 1999, EUROSPEECH.

[26] Helmer Strik,et al. Comparing different approaches for automatic pronunciation error detection , 2009, Speech Commun..

[27] Jinsong Zhang,et al. Developing a Chinese L2 speech database of Japanese learners with narrow-phonetic labels for computer assisted pronunciation training , 2010, INTERSPEECH.

[28] Yong Wang,et al. Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers , 2015, Speech Commun..

[29] Jieping Ye,et al. Deep Model Based Transfer and Multi-Task Learning for Biological Image Analysis , 2015, IEEE Transactions on Big Data.

[30] Frank K. Soong,et al. Automatic mispronunciation detection for Mandarin , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31] Frank K. Soong,et al. Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[32] James R. Glass,et al. Context-dependent pronunciation error pattern discovery with limited annotations , 2014, INTERSPEECH.

[33] Wolfgang Menzel,et al. Automatic detection and correction of non-native English pronunciations , 2000 .