Automatic Pronunciation Evaluation of Non-Native Mandarin Tone by Using Multi-Level Confidence Measures

Automatic evaluation of tonal production plays an important role in a tonal language Computer-Assisted Pronunciation Training (CAPT) system. In this paper, we propose an automatic evaluation method for non-native Mandarin tones. The method applied multi-level confidence measures generated from Deep Neural Network (DNN). The confidence measures consisted of Log Posterior Ratios (LPR), Average Frame-level Log Posteriors (AFLP) and Segment-level Log Posteriors (SLP). The LPR was calculated between the correct tone model and competing tone models. The AFLP and LPR were obtained from frame-level scores. And the SLP was directly derived from segment-level scores. The multi-level confidence measures were modeled with a support vector machine (SVM) classifier. For comparison, three experiments were conducted according to different features: AFLP+LPR, SLP only and AFLP+LPR+SLP. The experimental results showed that the performance of the system which used multilevel confidence measures was the best, achieving a FRR of 5.63% and a DA of 82.45%, which demonstrated the efficiency of the proposed method.

[1]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[2]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[3]  Frank K. Soong,et al.  Automatic Detection of Tone Mispronunciation in Mandarin , 2006, ISCSLP.

[4]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[5]  Rong Tong,et al.  Goodness of tone (GOT) for non-native Mandarin tone recognition , 2015, INTERSPEECH.

[6]  Wei Li,et al.  Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[8]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[9]  Ke Yan,et al.  Pronunciation Proficiency Evaluation based on Discriminatively Refined Acoustic Models , 2011 .

[10]  Ren-Hua Wang,et al.  CDF-Matching for Automatic Tone Error Detection in Mandarin Call System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  Frank K. Soong,et al.  Automatic mispronunciation detection for Mandarin , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Frank K. Soong,et al.  Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[15]  Yonghong Yan,et al.  Improvements in Tone Pronunciation Scoring for Strongly Accented Mandarin Speech , 2006 .

[16]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[17]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18]  Chao Ha Improved tone modeling by exploiting articulatory features for Mandarin speech recognition , 2013 .

[19]  Jinsong Zhang,et al.  A preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners , 2014, INTERSPEECH.

[20]  Wei Zhang,et al.  Mixed Models Based Pronunciation Evaluation of Mandarin Tone , 2013, J. Multim..

[21]  Mangui Liang,et al.  Detecting tone errors in continuous Mandarin speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Jinsong Zhang,et al.  Developing a Chinese L2 speech database of Japanese learners with narrow-phonetic labels for computer assisted pronunciation training , 2010, INTERSPEECH.

[23]  Mark Hasegawa-Johnson,et al.  Landmark of Mandarin nasal codas and its application in pronunciation error detection , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[25]  Bo Xu,et al.  Update progress of Sinohear: advanced Mandarin LVCSR system at NLPR , 2000, INTERSPEECH.

[26]  Jinsong Zhang,et al.  A study on robust detection of pronunciation erroneous tendency based on deep neural network , 2015, INTERSPEECH.

[27]  Yong Wang,et al.  Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers , 2015, Speech Commun..