论文信息 - Transparent pronunciation scoring using articulatorily weighted phoneme edit distance

Transparent pronunciation scoring using articulatorily weighted phoneme edit distance

For researching effects of gamification in foreign language learning for children in the "Say It Again, Kid!" project we developed a feedback paradigm that can drive gameplay in pronunciation learning games. We describe our scoring system based on the difference between a reference phone sequence and the output of a multilingual CTC phoneme recogniser. We present a white-box scoring model of mapped weighted Levenshtein edit distance between reference and error with error weights for articulatory differences computed from a training set of scored utterances. The system can produce a human-readable list of each detected mispronunciation's contribution to the utterance score. We compare our scoring method to established black box methods.

[1] Vipul Arora,et al. Phonological Feature Based Mispronunciation Detection and Diagnosis Using Multi-Task DNNs and Active Learning , 2017, INTERSPEECH.

[2] Henning Reetz,et al. Phonological feature-based speech recognition system for pronunciation training in non-native language learning. , 2018, The Journal of the Acoustical Society of America.

[3] Steve J. Young,et al. Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[4] Lionel Fontan,et al. Using Phonologically Weighted Levenshtein Distances for the Prediction of Microscopic Intelligibility , 2016, INTERSPEECH.

[5] Shelley Shwu-Ching Young,et al. The Game Embedded CALL System to Facilitate English Vocabulary Acquisition and Pronunciation , 2014, J. Educ. Technol. Soc..

[6] Mikko Kurimo,et al. SIAK - A Game for Foreign Language Pronunciation Learning , 2017, INTERSPEECH.

[7] Helmer Strik,et al. The goodness of pronunciation algorithm: a detailed performance study , 2009, SLaTE.

[8] Jinsong Zhang,et al. Multi-lingual and multi-task DNN learning for articulatory error detection , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[9] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[10] Susan Fitt,et al. Redundancy and productivity in the speech technology lexicon - can we do better? , 2006, INTERSPEECH.

[11] Wai Kit Lo,et al. Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training , 2009, SLaTE.

[12] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[13] Okko Johannes Räsänen,et al. Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions , 2018, INTERSPEECH.

[14] Jinsong Zhang,et al. Effective articulatory modeling for pronunciation error detection of L2 learner without non-native training data , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.