Toward unsupervised discovery of pronunciation error patterns using universal phoneme posteriorgram for computer-assisted language learning

In Computer-Aided Pronunciation Training, we hope to specify the type of mispronunciation, or Error Pattern (EP), the language learner has made as a more effective feedback. But derivation of EPs usually requires expert knowledge and pedagogical experiences, which is not easy to obtain for each pair of target and native languages. In this paper we propose a preliminary framework toward unsupervised discovery of EPs from a corpus of learners' recordings. We use Universal Phoneme Posteriorgram, derived from Multi-Layer Perceptron trained with a corpus of mixed languages, as features to bring supervised knowledge into the unsupervised task. We also use Hierarchical Agglomerative Clustering algorithm to explore sub-segmental variation of phoneme segments for distinguishing EPs. We tested K-means (assuming known number of EPs) and Gaussian Mixture Model with minimum description length principle (estimating unknown number of EPs) for EP discovery. Preliminary experimental results illustrated the effectiveness of the proposed framework, although there is still a long way to go compared to human annotators.

[1]  Alissa M. Harrison,et al.  Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English : The CUHK Experience Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English : The CUHK Experience , 2010 .

[2]  Frank K. Soong,et al.  Automatic mispronunciation detection for Mandarin , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Hervé Bourlard,et al.  Using KL-divergence and multilingual information to improve ASR for under-resourced languages , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Xiaoming Xi,et al.  Automatic scoring of non-native spontaneous speech in tests of spoken English , 2009, Speech Commun..

[5]  Yuen Yee Lo,et al.  Deriving salient learners’ mispronunciations from cross-language phonological comparisons , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[6]  Bo Xu,et al.  Exploring the automatic mispronunciation detection of confusable phones for mandarin , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Kun Li,et al.  Spoken English assessment system for non-native speakers using acoustic and prosodic features , 2010, INTERSPEECH.

[8]  Lin-Shan Lee,et al.  Unsupervised Hidden Markov Modeling of Spoken Queries for Spoken Term Detection without Speech Recognition , 2011, INTERSPEECH.

[9]  Björn Granström,et al.  Towards a virtual language tutor , 2004 .

[10]  Lan Wang,et al.  Improvement of Segmental Mispronunciation Detection with Prior Knowledge Extracted from Large L2 Speech Corpus , 2011, INTERSPEECH.

[11]  Lin-Shan Lee,et al.  Error Pattern Detection Integrating Generative and Discriminative Learning for Computer-Aided Pronunciation Training , 2012, INTERSPEECH.

[12]  Lin-Shan Lee,et al.  Improved approaches of modeling and detecting Error Patterns with empirical analysis for Computer-Aided Pronunciation Training , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Frank K. Soong,et al.  Improving mispronunciation detection using machine learning , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Charles A. Bouman,et al.  CLUSTER: An Unsupervised Algorithm for Modeling Gaussian Mixtures , 2014 .

[15]  Keikichi Hirose,et al.  Development of a program for self assessment of Japanese pronunciation by English learners , 2006, INTERSPEECH.

[16]  James R. Glass,et al.  Towards multi-speaker unsupervised speech pattern discovery , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Wai Kit Lo,et al.  Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training , 2009, SLaTE.

[18]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[19]  Helmer Strik,et al.  Error Selection for ASR-Based English Pronunciation Training in 'My Pronunciation Coach' , 2011, INTERSPEECH.

[20]  Aren Jansen,et al.  Rapid Evaluation of Speech Representations for Spoken Term Discovery , 2011, INTERSPEECH.

[21]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[22]  Yu Hu,et al.  A new method for mispronunciation detection using Support Vector Machine based on Pronunciation Space Models , 2009, Speech Commun..

[23]  Stephanie Seneff,et al.  Rainbow rummy: a web-based game for vocabulary acquisition using computer-directed speech , 2009, SLaTE.