Improvement of a speech recognizer for standardized medical assessment of children's speech by integration of prior knowledge

Speech recognition of children is a more difficult task than speech recognition of adults. This problem is amplified for children with articulation disorders like cleft lip and palate (CLP). In this work we improved our automatic speech recognition system by integrating prior knowledge. Prior knowledge focuses on two different aspects: A test-dependent language modeling and an age-dependent acoustic modeling. These two approaches are merged at the end to different test- and age-dependent recognizers. We evaluated our system on a dataset of 35 children with CLP. Significant improvements could be found on this dataset. With our baseline system we achieved a negative word accuarcy (WA) of −11.0%. By an extended language modeling we achieved 27.5%. The age-dependent recognition system gains a huge improvement and achieves aWA of 42.6%. With the significant improvements in WA it is possible to perform an automatic detection and identification of specific words. Thus, we took the first step towards a speech assessment on word and subword level.

[1]  M. Schuster,et al.  Evaluation of speech intelligibility for children with cleft lip and palate by means of automatic speech recognition. , 2006, International journal of pediatric otorhinolaryngology.

[2]  Elmar Nöth,et al.  Fully Automatic Assessment of Speech of Children with Cleft Lip and Palate , 2006, Informatica.

[3]  J. Červenka,et al.  Classification and birth prevalence of orofacial clefts. , 1998, American journal of medical genetics.

[4]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[5]  Tomoki Toda,et al.  Development of preschool children subsystem for ASR and q&a in a real-environment speech-oriented guidance task , 2007, INTERSPEECH.

[6]  Shrikanth S. Narayanan,et al.  Acoustics of children's speech: developmental changes of temporal and spectral parameters. , 1999, The Journal of the Acoustical Society of America.

[7]  Herbert Gish,et al.  A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Prof. Dr. Dr. A. Hemprich Ulrike Wohlleben: Die Verständlichkeitsentwicklung von Kindern mit Lippen-Kiefer-Gaumen-Segel-Spalten. Eine Längsschnittstudie über spalttypische Charakteristika und deren Veränderung , 2005, Mund-, Kiefer- und Gesichtschirurgie.

[9]  Fabio Brugnara,et al.  Acoustic variability and automatic recognition of children's speech , 2007, Speech Commun..

[10]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[11]  Elmar Nöth,et al.  Age Determination of Children in Preschool and Primary School Age with GMM-Based Supervectors and Support Vector Machines/Regression , 2008, TSD.

[12]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[13]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[14]  Elmar Nöth,et al.  Towards monitoring of children2s speech - a case study , 2008, WOCCI.

[15]  Maria Schuster,et al.  PEAKS - a Platform for Evaluation and Analysis of all Kinds of Speech Disorders , 2007 .

[16]  Shrikanth S. Narayanan,et al.  A review of ASR technologies for children's speech , 2009, WOCCI.