The goodness of pronunciation algorithm applied to disordered speech

In this paper, we report on a study with the aim of automatically detecting phoneme-level mispronunciations in 32 French speakers suffering from unilateral facial palsy at four different clinical severity grades. We sought to determine if the Goodness of Pronunciation (GOP) algorithm, which is commonly used in Computer-Assisted Language Learning systems to detect learners’ individual errors, could also detect segmental deviances in disordered speech. For this purpose, speech read by the 32 speakers was aligned and GOP scores were computed for each phone realization. The highest scores, which indicate large dissimilarities with standard phone realizations, were obtained for the most severely impaired speakers. The corresponding speech subset was manually transcribed at phone-level. 8.3% of the phones differed from standard pronunciations extracted from our lexicon. The GOP technique allowed to detect 70.2% of mispronunciations with an equal rate of about 30% of false rejections and false acceptances. The phone substitutions detected by the algorithm confirmed that some of the speakers have difficulties to produce bilabial plosives, and showed that other sounds such as sibilants are prone to mispronunciation. Another interesting finding was the fact that speakers diagnosed with a same pathology grade do not necessarily share the same pronunciation issues. Index Terms: pronunciation automatic assessment, Goodness of Pronunciation, disordered speech

[1]  C. W. Cummings,et al.  Cummings otolaryngology--head & neck surgery , 2010 .

[2]  Unn Ljøstad,et al.  Acute peripheral facial palsy in adults , 2005, Journal of Neurology.

[3]  Julie Mauclair,et al.  Burst-based features for the classification of pathological voices , 2013, INTERSPEECH.

[4]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .

[5]  Silke M. Witt,et al.  Use of speech recognition in computer-assisted language learning , 2000 .

[6]  Kristin Rosen,et al.  Automatic speech recognition and a review of its functioning with dysarthric speech , 2000 .

[7]  Guillaume Gravier,et al.  The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[8]  Helmer Strik,et al.  The goodness of pronunciation algorithm: a detailed performance study , 2009, SLaTE.

[9]  D. Baguley,et al.  Reliability of the House and Brackmann grading system for facial palsy , 1989, The Journal of Laryngology & Otology.

[10]  Helmer Strik,et al.  Comparing classifiers for pronunciation error detection , 2007, INTERSPEECH.

[11]  Maxine Eskénazi,et al.  An overview of spoken language technology for education , 2009, Speech Commun..

[12]  Mitch Weintraub,et al.  Automatic evaluation and training in English pronunciation , 1990, ICSLP.

[13]  Lawrence S. Meyers,et al.  Computer recognition of the speech of adults with cerebral palsy and dysarthria , 1991 .

[14]  F Rudzicz,et al.  Articulatory Knowledge in the Recognition of Dysarthric Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Eric Sanders,et al.  Automatic Recognition Of Dutch Dysarthric Speech, A Pilot Study , 2002 .

[16]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[17]  N. Surgery [Facial nerve grading system]. , 2006, Zhonghua er bi yan hou tou jing wai ke za zhi = Chinese journal of otorhinolaryngology head and neck surgery.

[18]  M. Schuster,et al.  Evaluation of speech intelligibility for children with cleft lip and palate by means of automatic speech recognition. , 2006, International journal of pediatric otorhinolaryngology.