Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult Speech

This paper describes the continued development of a system to provide early assessment of speech development issues in children and better triaging to professional services. Whilst corpora of children’s speech are increasingly available, recognition of disordered children’s speech is still a data-scarce task. Transfer learning methods have been shown to be effective at leveraging out-of-domain data to improve ASR performance in similar data-scarce applications. This paper combines transfer learning, with previously developed methods for constrained decoding based on expert speech pathology knowledge and knowledge of the target text. Results of this study show that transfer learning with out-of-domain adult speech can improve phoneme recognition for disordered children’s speech. Specifically, a Deep Neural Network (DNN) trained on adult speech and finetuned on a corpus of disordered children’s speech reduced the phoneme error rate (PER) of a DNN trained on a children’s corpus from 16.3% to 14.2%. Furthermore, this fine-tuned DNN also improved the performance of a Hierarchal Neural Network based acoustic model previously used by the system with a PER of 19.3%. We close with a discussion of our planned future developments of the system.

[1]  Heidi Christensen,et al.  Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech , 2013, INTERSPEECH.

[2]  Sheena Reilly,et al.  Speech sound disorder at 4 years: prevalence, comorbidities, and predictors in a community cohort of children , 2015, Developmental medicine and child neurology.

[3]  Jill Freyne,et al.  Feasibility of Technology Enabled Speech Disorder Screening , 2016, HIC.

[4]  Jordan R. Green,et al.  Towards an Automated Screening Tool for Developmental Speech and Language Impairments , 2016, INTERSPEECH.

[5]  Emre Yilmaz,et al.  An ASR-based interactive game for speech therapy , 2016 .

[6]  Sheena Reilly,et al.  Who to Refer for Speech Therapy at 4 Years of Age Versus Who to “Watch and Wait”? , 2017, The Journal of pediatrics.

[7]  Charles R Doarn,et al.  Overview of telehealth activities in speech-language pathology. , 2008, Telemedicine journal and e-health : the official journal of the American Telemedicine Association.

[8]  Helen M Sharp,et al.  Speech and language development and disorders in children. , 2008, Pediatric clinics of North America.

[9]  Dominique Estival,et al.  AusTalk: an audio-visual corpus of Australian English , 2014, LREC.

[10]  B. Dodd,et al.  Diagnostic Evaluation of Articulation and Phonology , 2002 .

[11]  Peter A. Heeman,et al.  Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered Speech , 2016, INTERSPEECH.

[12]  Neethu Mariam Joy,et al.  Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages , 2016, INTERSPEECH.

[13]  V. Mann,et al.  Speech production deficits in early readers: predictors of risk , 2011, Reading and writing.

[14]  Sharynne McLeod,et al.  Speech sound disorders in a community study of preschool children. , 2013, American journal of speech-language pathology.

[15]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[16]  Isabel Trancoso,et al.  Text-dependent pathological voice detection , 2012, INTERSPEECH.

[17]  Jill Freyne,et al.  Automated Screening of Speech Development Issues in Children by Identifying Phonological Error Patterns , 2016, INTERSPEECH.

[18]  Visar Berisha,et al.  Articulation Entropy: An Unsupervised Measure of Articulatory Precision , 2017, IEEE Signal Processing Letters.

[19]  Dwight W. Irvin,et al.  Using the Language Environment Analysis (LENA) system in preschool classrooms with children with autism spectrum disorders , 2013, Autism : the international journal of research and practice.

[20]  Razieh Fallah,et al.  The prevalence of speech disorder in primary school students in Yazd-Iran. , 2011, Acta medica Iranica.

[21]  Emre Yilmaz,et al.  Combining Non-Pathological Data of Different Language Varieties to Improve DNN-HMM Performance on Pathological Speech , 2016, INTERSPEECH.

[22]  Peter Bell,et al.  Improving Children's Speech Recognition Through Out-of-Domain Data Augmentation , 2016, INTERSPEECH.

[23]  Elmar Nöth,et al.  PEAKS - A system for the automatic evaluation of voice and speech disorders , 2009, Speech Commun..

[24]  Ricardo Gutierrez-Osuna,et al.  Tabby Talks: An automated tool for the assessment of childhood apraxia of speech , 2015, Speech Commun..