Automatic Chinese pronunciation error detection using SVM trained with structural features

Pronunciation errors are often made by learners of a foreign language. To build a Computer-Assisted Language Learning (CALL) system to support them, automatic error detection is essential. In this study, Japanese learners of Chinese are focused on. We investigated in automatic detection of their typical and frequent phoneme production errors. For this aim, four databases are newly created and we propose a detection method using Support Vector Machine (SVM) with structural features. The proposed method is compared to two baseline methods of Goodness Of Pronunciation (GOP) and Likelihood Ratio (LR) under the task of phoneme error detection. Experiments show that the proposed method performs much better than both of the two baseline methods. For example, the false rejection rate is reduced by as much as 82%. However, the results also indicate some drawbacks of using SVM with structural features. In this paper, we discuss merits and demerits of the proposed method and in what kind of real applications it works effectively.

[1]  Keikichi Hirose,et al.  Structure to speech conversion - speech generation based on infant-like vocal imitation , 2008, INTERSPEECH.

[2]  Keikichi Hirose,et al.  Regularized Maximum Likelihood Linear Regression Adaptation for Computer-Assisted Language Learning Systems , 2011, IEICE Trans. Inf. Syst..

[3]  Nobuaki Minematsu,et al.  Speech Structure and Its Application to Robust Speech Processing , 2009, New Generation Computing.

[4]  Linda R. Waugh,et al.  The Sound Shape of Language , 1979 .

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[7]  Lin-Shan Lee,et al.  Improved approaches of modeling and detecting Error Patterns with empirical analysis for Computer-Aided Pronunciation Training , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Helmer Strik,et al.  The goodness of pronunciation algorithm: a detailed performance study , 2009, SLaTE.

[9]  Nobuaki Minematsu,et al.  Discriminative Reranking for LVCSR Leveraging Invariant Structure , 2012, INTERSPEECH.

[10]  Vassilios Digalakis,et al.  Combination of machine scores for automatic grading of pronunciation quality , 2000, Speech Commun..

[11]  Maxine Eskénazi,et al.  An overview of spoken language technology for education , 2009, Speech Commun..

[12]  Keikichi Hirose,et al.  Structural analysis of dialects, sub-dialects and sub-sub-dialects of Chinese , 2009, INTERSPEECH.

[13]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[14]  Keikichi Hirose,et al.  Integration of multilayer regression analysis with structure-based pronunciation assessment , 2010, INTERSPEECH.

[15]  Nobuaki Minematsu,et al.  A Study on Invariance of $f$-Divergence and Its Application to Speech Recognition , 2010, IEEE Transactions on Signal Processing.