Pronunciation Assessment of Japanese Learners of French with GOP Scores and Phonetic Information

In this paper, we report automatic pronunciation assessment experiments at phone-level on a read speech corpus in French, collected from 23 Japanese speakers learning French as a foreign language. We compare the standard approach based on Goodness Of Pronunciation (GOP) scores and phone-specific score thresholds to the use of logistic regressions (LR) models. French native speech corpus, in which artificial pronunciation errors were introduced, was used as training set. Two typical errors of Japanese speakers were considered: /o/ and /v/ of ten mispronounced as [l] and [b], respectively. The LR classifier achieved a 64.4% accuracy similar to the 63.8% accuracy of the baseline threshold method, when using GOP scores and the expected phone identity as input features only. A significant performance gain of 20.8% relative was obtained by adding phonetic and phonological features as input to the LR model, leading to a 77.1% accuracy. This LR model also outperformed another baseline approach based on linear discriminant models trained on raw f-BANK coefficient features.

[1]  Helmer Strik,et al.  The goodness of pronunciation algorithm: a detailed performance study , 2009, SLaTE.

[2]  S. Detey Interphonologie et représentations orthographiques. Du rôle de l'écrit dans l'enseignement / apprentissage du français oral chez des étudiants japonais , 2005 .

[3]  Emily Mower Provost,et al.  Modeling pronunciation, rhythm, and intonation for automatic assessment of speech quality in aphasia rehabilitation , 2014, INTERSPEECH.

[4]  Maxine Eskénazi,et al.  An overview of spoken language technology for education , 2009, Speech Commun..

[5]  Tatsuya Kawahara,et al.  An English pronunciation learning system for Japanese students based on diagnosis of critical pronunciation errors , 2004, ReCALL.

[6]  S. Detey,et al.  Interphonologie et représentations orthographiques. Le cas des catégories /b/ et /v/ chez des apprenants japonais de Français Langue Etrangère. , 2005 .

[7]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[8]  Guillaume Gravier,et al.  The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[9]  Silke M. Witt,et al.  Use of speech recognition in computer-assisted language learning , 2000 .

[10]  Yong Wang,et al.  Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers , 2015, Speech Commun..

[11]  HOW DO NATIVE SPEAKERS OF JAPANESE DISCRIMINATE AND CATEGORIZE FRENCH / r / AND / l / ? , 1999 .

[12]  Helmer Strik,et al.  Comparing classifiers for pronunciation error detection , 2007, INTERSPEECH.

[13]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .

[14]  Laurence Labrune The Phonology of Japanese , 2012 .

[15]  Keikichi Hirose,et al.  Analysis and utilization of MLLR speaker adaptation technique for learners' pronunciation evaluation , 2009, INTERSPEECH.

[16]  Maxine Eskénazi,et al.  Design considerations and text selection for BREF, a large French read-speech corpus , 1990, ICSLP.