Design and Implementation of an English Pronunciation Scoring System for Pupils Based on DNN-HMM

Nowadays, the problem of poor performance on English pronunciation among Chinese pupils is widespread because of the shortage of foreign teachers, related products and resources. To improve their English pronunciation, we have developed an English word pronunciation scoring system. The system applies DNN-HMMs (deep neural network-hidden Markov models) for acoustic modeling to implement the English speech recognition and the GOP (Goodness of Pronunciation) algorithm for scoring pronunciation based on the likelihood output by DNN-HMMs. The method is implemented on HTK (Hidden Markov Model Toolkit), which integrates a set of tools for building speech recognition system. Moreover, the system is practically deployed, in which the server side conducts the model training; the client side collects pupils' speech data, and feedbacks pupils the inferred scores. Specifically, we collect 150 word pronunciations of 30 pupils as test data and compare the scores given by the system with scores given by teachers. The results demonstrate the reliability of the developed pupil pronunciation scoring system based on DNN-HMM.

[1]  Wei Li,et al.  Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees , 2016, INTERSPEECH.

[2]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Diego Giuliani,et al.  Deep-neural network approaches for speech recognition with heterogeneous groups of speakers including children† , 2016, Natural Language Engineering.

[4]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[5]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[6]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[7]  Carla Lopes,et al.  Phone Recognition on the TIMIT Database , 2012 .

[8]  Lin-Shan Lee,et al.  Improved approaches of modeling and detecting Error Patterns with empirical analysis for Computer-Aided Pronunciation Training , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Vipul Arora,et al.  Phonological Feature Based Mispronunciation Detection and Diagnosis Using Multi-Task DNNs and Active Learning , 2017, INTERSPEECH.

[10]  Frank K. Soong,et al.  An improved DNN-based approach to mispronunciation detection and diagnosis of L2 learners' speech , 2015, SLaTE.

[11]  Yong Wang,et al.  Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers , 2015, Speech Commun..

[12]  Frank K. Soong,et al.  Mispronunciation detection for Mandarin Chinese , 2008, INTERSPEECH.

[13]  Frank K. Soong,et al.  A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL) , 2013, INTERSPEECH.

[14]  Satoshi Nakamura,et al.  Automatic pronunciation scoring of words and sentences independent from the non-native's first language , 2009, Comput. Speech Lang..

[15]  Xin Chen,et al.  Deep neural network acoustic models for spoken assessment applications , 2015, Speech Commun..

[16]  Xiangang Li,et al.  A Comparative Study on Selecting Acoustic Modeling Units in Deep Neural Networks Based Large Vocabulary Chinese Speech Recognition , 2013, IScIDE.

[17]  Berlin Chen,et al.  Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision Functions , 2016, INTERSPEECH.