Automatic mispronunciation detection for english learners by GMM-UBM and GLDS-SVM methods

The paper proposes an efficient generalized linear discriminant sequence based SVM (GLDS-SVM) based mispronunciation detection method.Firstly,in order to enhance the ability of describing pronunciation characteristics,we introduce an improved SVM feature normalization scheme based on stateconcatenated operation.Then,we propose a novel multi-model strategy for model training to make full use of samples and solve the problem of data unbalance caused by lack of the actual mispronunciation corpus.Finally,we combine GLDS-SVM with universal background models based GMM (GMM-UBM) to further improve the performance.The fused system by these two methods achieves 9.92% and 16.35% in equal error rate (EER) for simulation set and real set,respectively.Meanwhile,GLDSSVM processes a higher computation speed and smaller model size than traditional radial basic function (RBF) kernel.

[1]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[2]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[3]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[6]  Bo Xu,et al.  Exploring the automatic mispronunciation detection of confusable phones for mandarin , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Yan Yonghong Objective evaluation of vowels of standard Chinese pronunciation based on formant pattern , 2007 .

[8]  Yonghong Yan,et al.  Mandarin vowel pronunciation quality evaluation by a novel formant classification method and its combination with traditional algorithms , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Joseph Picone,et al.  Applications of support vector machines to speech recognition , 2004, IEEE Transactions on Signal Processing.