Subspace Gaussian mixture model for computer-assisted language learning

In computer-assisted language learning (CALL), speech data from non-native speakers are usually insufficient for acoustic modeling. Subspace Gaussian Mixture Models (SGMM) have been effective in training automatic speech recognition (ASR) systems with limited amounts of training data. Therefore, in this work, we propose to use SGMM to improve the fluency assessment performance. In particular, the contributions of this work are: (i) The proposed SGMM acoustic model trained with native data outperforms the MMI-GMM/HMM baseline by 25% relative, (ii) when incorporating a small amount of non-native training data, the SGMM acoustic model further improves the performance of fluency assessment by 47% relative.

[1]  Silke M. Witt,et al.  Use of speech recognition in computer-assisted language learning , 2000 .

[2]  Frank K. Soong,et al.  A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL) , 2013, INTERSPEECH.

[3]  Yonghong Yan,et al.  An Mandarin Pronunciation Quality Assessment System Using Two Kinds of Acoustic Models , 2009, 2009 International Conference on Research Challenges in Computer Science.

[4]  Kai Feng,et al.  Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Frank K. Soong,et al.  Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT) , 2010, INTERSPEECH.

[6]  J. Flege Factors affecting degree of perceived foreign accent in English sentences. , 1988, The Journal of the Acoustical Society of America.

[7]  James R. Glass,et al.  Mispronunciation detection via dynamic time warping on deep belief network-based posteriorgrams , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Thomas Niesler,et al.  Automatic large-scale oral language proficiency assessment , 2007, INTERSPEECH.

[9]  Mitchell Peabody,et al.  Methods for pronunciation assessment in computer aided language learning , 2011 .

[10]  Helmer Strik,et al.  Automatic detection of frequent pronunciation errors made by L2-learners , 2005, INTERSPEECH.

[11]  Li Deng,et al.  An Overview of Modern Speech Recognition , 2010, Handbook of Natural Language Processing.

[12]  Bin Ma,et al.  Spoken Language Recognition: From Fundamentals to Practice , 2013, Proceedings of the IEEE.

[13]  Daniel Povey,et al.  A Tutorial-style Introduction to Subspace Gaussian Mixture Models for Speech Recognition , 2009 .

[14]  Yu Hu,et al.  A new method for mispronunciation detection using Support Vector Machine based on Pronunciation Space Models , 2009, Speech Commun..

[15]  Frank K. Soong,et al.  The Use of DBN-HMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training , 2012, INTERSPEECH.

[16]  Bin Ma,et al.  Large-scale characterization of Mandarin pronunciation errors made by native speakers of European languages , 2013, INTERSPEECH.

[17]  Helen Meng,et al.  Discriminatively Trained Acoustic Model for Improving Mispronunciation Detection and Diagnosis in Computer Aided Pronunciation Training ( CAPT ) , 2010 .

[18]  L. Boves,et al.  Quantitative assessment of second language learners' fluency by means of automatic speech recognition technology. , 2000, The Journal of the Acoustical Society of America.

[19]  Helmer Strik,et al.  Towards an Automatic Oral Proficiency Test for Dutch as a Second Language: Automatic Pronunciation Assessment in Read and Spontaneous Speech , 2000 .

[20]  Ke Yan,et al.  Pronunciation Proficiency Evaluation based on Discriminatively Refined Acoustic Models , 2011 .

[21]  Khe Chai Sim,et al.  A Two-stage Speaker Adaptation Approach for Subspace Gaussian Mixture Model based Nonnative Speech Recognition , 2012, INTERSPEECH.

[22]  Anne Cutler,et al.  Supervised and unsupervised learning of multidimensionally varying non-native speech categories , 2008, Speech Commun..

[23]  James R. Glass,et al.  A Comparison-based Approach to Mispronunciation Detection by , 2012 .