Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees

We propose a novel decision tree based framework to detect phonetic mispronunciations produced by L2 learners caused by using inaccurate speech attributes, such as manner and place of articulation. Compared with conventional score-based CAPT (computer assisted pronunciation training) systems, our proposed framework has three advantages: (1) each mispronunciation in a tree can be interpreted and communicated to the L2 learners by traversing the corresponding path from a leaf node to the root node; (2) corrective feedback based on speech attribute features, which are directly used to describe how consonants and vowels are produced using related articulators, can be provided to the L2 learners; and (3) by building the phone-dependent decision tree, the relative importance of the speech attribute features of a target phone can be automatically learned and used to distinguish itself from other phones. This information can provide L2 learners speech attribute feedback that is ranked in order of importance. In addition to the abovementioned advantages, experimental results confirm that the proposed approach can detect most pronunciation errors and provide accurate diagnostic feedback.

[1]  Horacio Franco,et al.  Automatic detection of phone-level mispronunciation for language learning , 1999, EUROSPEECH.

[2]  Wei Li,et al.  Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[4]  Shuang Zhang,et al.  Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system , 2010, INTERSPEECH.

[5]  Stephanie Seneff,et al.  An interactive English pronunciation dictionary for Korean learners , 2004, INTERSPEECH.

[6]  Chin-Hui Lee,et al.  Decision tree based tone modeling with corrective feedbacks for automatic Mandarin tone assessment , 2010, INTERSPEECH.

[7]  Jinsong Zhang,et al.  A preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners , 2014, INTERSPEECH.

[8]  Frank K. Soong,et al.  Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Olle Bälter,et al.  Wizard-of-Oz test of ARTUR: a computer-based speech training system with articulation correction , 2005, Assets '05.

[10]  Jinsong Zhang,et al.  A study on robust detection of pronunciation erroneous tendency based on deep neural network , 2015, INTERSPEECH.

[11]  Mark Hasegawa-Johnson,et al.  Landmark-based automated pronunciation error detection , 2010, INTERSPEECH.

[12]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[13]  Kun Li,et al.  Mispronunciation detection and diagnosis in l2 english speech using multi-distribution Deep Neural Networks , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[14]  Mark Hasegawa-Johnson,et al.  Landmark of Mandarin nasal codas and its application in pronunciation error detection , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Chin-Hui Lee,et al.  An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition , 2013, Proceedings of the IEEE.

[16]  Yu Hu,et al.  A new method for mispronunciation detection using Support Vector Machine based on Pronunciation Space Models , 2009, Speech Commun..

[17]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[18]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[19]  Yong Wang,et al.  Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers , 2015, Speech Commun..

[20]  Gunnar Fant,et al.  Speech sounds and features , 1973 .

[21]  Bo Xu,et al.  Exploring the automatic mispronunciation detection of confusable phones for mandarin , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Rong Tong,et al.  iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent , 2015, INTERSPEECH.

[23]  Helen M. Meng Developing Speech Recognition and Synthesis Technologies to Support Computer-Aided Pronunciation Training for Chinese Learners of English , 2009, PACLIC.

[24]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[25]  Olov Engwall Analysis of and feedback on phonetic features in pronunciation training with a virtual teacher , 2012 .

[26]  Bo Xu,et al.  Update progress of Sinohear: advanced Mandarin LVCSR system at NLPR , 2000, INTERSPEECH.

[27]  Wei Li,et al.  A study on Functional Loads of phonetic contrasts under context based on Mutual Information of Chinese text and phonemes , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[28]  Helmer Strik,et al.  ASR-based corrective feedback on pronunciation: does it really work? , 2006, INTERSPEECH.

[29]  Bo Xu,et al.  High performance automatic mispronunciation detection method based on neural network and TRAP features , 2009, INTERSPEECH.

[30]  Yuen Yee Lo,et al.  Deriving salient learners’ mispronunciations from cross-language phonological comparisons , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).