论文信息 - Automatic Pronunciation Evaluation of Non-Native Mandarin Tone by Using Multi-Level Confidence Measures

Automatic Pronunciation Evaluation of Non-Native Mandarin Tone by Using Multi-Level Confidence Measures

Automatic evaluation of tonal production plays an important role in a tonal language Computer-Assisted Pronunciation Training (CAPT) system. In this paper, we propose an automatic evaluation method for non-native Mandarin tones. The method applied multi-level confidence measures generated from Deep Neural Network (DNN). The confidence measures consisted of Log Posterior Ratios (LPR), Average Frame-level Log Posteriors (AFLP) and Segment-level Log Posteriors (SLP). The LPR was calculated between the correct tone model and competing tone models. The AFLP and LPR were obtained from frame-level scores. And the SLP was directly derived from segment-level scores. The multi-level confidence measures were modeled with a support vector machine (SVM) classifier. For comparison, three experiments were conducted according to different features: AFLP+LPR, SLP only and AFLP+LPR+SLP. The experimental results showed that the performance of the system which used multilevel confidence measures was the best, achieving a FRR of 5.63% and a DA of 82.45%, which demonstrated the efficiency of the proposed method.

Jinsong Zhang | Yanlu Xie | Ju Lin

[1] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[2] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[3] Frank K. Soong,et al. Automatic Detection of Tone Mispronunciation in Mandarin , 2006, ISCSLP.

[4] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[5] Rong Tong,et al. Goodness of tone (GOT) for non-native Mandarin tone recognition , 2015, INTERSPEECH.

[6] Wei Li,et al. Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] David G. Stork,et al. Pattern classification, 2nd Edition , 2000 .

[8] Steve J. Young,et al. Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[9] Ke Yan,et al. Pronunciation Proficiency Evaluation based on Discriminatively Refined Acoustic Models , 2011 .

[10] Ren-Hua Wang,et al. CDF-Matching for Automatic Tone Error Detection in Mandarin Call System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12] Frank K. Soong,et al. Automatic mispronunciation detection for Mandarin , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[14] Frank K. Soong,et al. Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[15] Yonghong Yan,et al. Improvements in Tone Pronunciation Scoring for Strongly Accented Mandarin Speech , 2006 .

[16] David Talkin,et al. A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[17] Ramesh A. Gopinath,et al. Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18] Chao Ha. Improved tone modeling by exploiting articulatory features for Mandarin speech recognition , 2013 .

[19] Jinsong Zhang,et al. A preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners , 2014, INTERSPEECH.

[20] Wei Zhang,et al. Mixed Models Based Pronunciation Evaluation of Mandarin Tone , 2013, J. Multim..

[21] Mangui Liang,et al. Detecting tone errors in continuous Mandarin speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22] Jinsong Zhang,et al. Developing a Chinese L2 speech database of Japanese learners with narrow-phonetic labels for computer assisted pronunciation training , 2010, INTERSPEECH.

[23] Mark Hasegawa-Johnson,et al. Landmark of Mandarin nasal codas and its application in pronunciation error detection , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[25] Bo Xu,et al. Update progress of Sinohear: advanced Mandarin LVCSR system at NLPR , 2000, INTERSPEECH.

[26] Jinsong Zhang,et al. A study on robust detection of pronunciation erroneous tendency based on deep neural network , 2015, INTERSPEECH.

[27] Yong Wang,et al. Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers , 2015, Speech Commun..