Automatic pronunciation scoring of words and sentences independent from the non-native's first language

This paper describes an approach for automatic scoring of pronunciation quality for non-native speech. It is applicable regardless of the foreign language student's mother tongue. Sentences and words are considered as scoring units. Additionally, mispronunciation and phoneme confusion statistics for the target language phoneme set are derived from human annotations and word level scoring results using a Markov chain model of mispronunciation detection. The proposed methods can be employed for building a part of the scoring module of a system for computer assisted pronunciation training (CAPT). Methods from pattern and speech recognition are applied to develop appropriate feature sets for sentence and word level scoring. Besides features well-known from and approved in previous research, e.g. phoneme accuracy, posterior score, duration score and recognition accuracy, new features such as high-level phoneme confidence measures are identified. The proposed method is evaluated with native English speech, non-native English speech from German, French, Japanese, Indonesian and Chinese adults and non-native speech from German school children. The speech data are annotated with tags for mispronounced words and sentence level ratings by native English teachers. Experimental results show, that the reliability of automatic sentence level scoring by the system is almost as high as the average human evaluator. Furthermore, a good performance for detecting mispronounced words is achieved. In a validation experiment, it could also be verified, that the system gives the highest pronunciation quality scores to 90% of native speakers' utterances. Automatic error diagnosis based on a automatically derived phoneme mispronunciation statistic showed reasonable results for five non-native speaker groups. The statistics can be exploited in order to provide the non-native feedback on mispronounced phonemes.

[1]  Mitch Weintraub,et al.  Automatic scoring of pronunciation quality , 2000, Speech Commun..

[2]  Akinori Ito,et al.  Pronunciation error detection method based on error rule clustering using a decision tree , 2005, INTERSPEECH.

[3]  Helmer Strik,et al.  Feedback in computer assisted pronunciation training: technology push or demand pull? , 2002, INTERSPEECH.

[4]  Helmer Strik,et al.  The Pedagogy-Technology Interface in Computer Assisted Pronunciation Training , 2002 .

[5]  Nobuaki Minematsu Yet another acoustic representation of speech sounds , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Vassilios Digalakis,et al.  Automatic pronunciation evaluation of foreign speakers using unknown text , 2007, Comput. Speech Lang..

[7]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[8]  Tatsuya Kawahara,et al.  Formant structure estimation using vocal tract length normalization for CALL systems , 2003 .

[9]  Daniel Elenius,et al.  The PF_STAR children's speech corpus , 2005, INTERSPEECH.

[10]  D. Horga HANDBOOK OF THE INTERNATIONAL PHONETIC ASSOCIATION. A GUIDE TO THE USE OF THE INTERNATIONAL PHONETIC ALPHABET Cambridge: Cambridge University Press (1999), (204 stranice) , 1999 .

[11]  Eric Atwell,et al.  Automatic localization and diagnosis of pronunciation errors for second-language learners of English. , 1999 .

[12]  Helmer Strik,et al.  Pronunciation Evaluation in Read and Spontaneous Speech: A Comparison between human ratings and automatic scores , 2002 .

[13]  Stephen Cox,et al.  High-level approaches to confidence estimation in speech recognition , 2002, IEEE Trans. Speech Audio Process..

[14]  Seok-Chae Rhee,et al.  Development of the knowledge-based spoken English evaluation system and its application , 2004, INTERSPEECH.

[15]  kawahara tsubota,et al.  CALL SYSTEM FOR JAPANESE STUDENTS OF ENGLISH USING PRONUNCIATION ERROR PREDICTION AND FORMANT STRUCTURE ESTIMATION , 2002 .

[16]  Vassilios Digalakis,et al.  Combination of machine scores for automatic grading of pronunciation quality , 2000, Speech Commun..

[17]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[18]  Mitch Weintraub,et al.  Automatic evaluation and training in English pronunciation , 1990, ICSLP.

[19]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[20]  Kristin Precoda,et al.  Prosodic features for automatic text-independent evaluation of degree of nativeness for language learners , 2000, INTERSPEECH.