A survey about ASR for children

This paper is intended to surv ey the state of the art of automatic speech recognition (ASR) for children’s speech. Investigating ASR for children is a current trend in research. Therefore databases of children’s speech are needed for training and testing of ASR systems. In the first part of this paper the most relevant databases of children’s speech are described. There are less speech data of children available than of adults and speech of preschool children is even more rarely available. In the second part of this paper the common techniques for recognizing children’s speech are summarized. Most investigations about children’s ASR focus on the acoustic model. The common methods are described and approaches regarding the lexical and speech model are mentioned subsequently. In an extensive literature research we collected papers investigating ASR for children. Several studies have been carried out investigating children’s ASR. Due to the lack of data from preschool children only a few investigations for this age group have been accomplished. This is illustrated by presenting a statistic on the age of the children in past studies.

[1]  R. Cole,et al.  THE OGI KIDS’ SPEECH CORPUS AND RECOGNIZERS , 2000 .

[2]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[3]  Rüdiger Hoffmann,et al.  A survey about databases of children's speech , 2013, INTERSPEECH.

[4]  Shrikanth S. Narayanan,et al.  A review of ASR technologies for children's speech , 2009, WOCCI.

[5]  Martin J. Russell,et al.  Recognition of read and spontaneous children's speech using two new corpora , 2004, INTERSPEECH.

[6]  Ronald A. Cole,et al.  Advances in Children's Speech Recognition within an Interactive Literacy Tutor , 2004, HLT-NAACL.

[7]  H. Wakita Normalization of vowels by vocal-tract length and its application to vowel identification , 1977 .

[8]  Abeer Alwan,et al.  TBALL data collection: the making of a young children's speech corpus , 2005, INTERSPEECH.

[9]  Shrikanth S. Narayanan,et al.  Reference marking in children's computer-directed speech: an integrated analysis of discourse and gestures , 2004, INTERSPEECH.

[10]  Shrikanth S. Narayanan,et al.  Automatic Detection of Disfluency Boundaries in Spontaneous Speech of Children Using Audio–Visual Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Hugo Van hamme,et al.  Children’s Oral Reading Corpus (CHOREC): Description and Assessment of Annotator Agreement , 2008, LREC.

[12]  Tomoki Toda,et al.  Development of preschool children subsystem for ASR and q&a in a real-environment speech-oriented guidance task , 2007, INTERSPEECH.

[13]  Martin J. Russell,et al.  A comparison of human and computer recognition accuracy for children's speech , 2005, INTERSPEECH.

[14]  Qun Li,et al.  An analysis of the causes of increased error rates in children²s speech recognition , 2002, INTERSPEECH.

[15]  Shrikanth S. Narayanan,et al.  Improvements in predicting children's overall reading ability by modeling variability in evaluators' subjective judgments , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Mark A. Fanty,et al.  Rapid unsupervised adaptation to children's speech on a connected-digit task , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17]  Fabio Brugnara,et al.  Acoustic variability and automatic recognition of children's speech , 2007, Speech Commun..

[18]  Rüdiger Hoffmann,et al.  Robustness optimization of a speech interface for child-directed embedded language tutoring , 2009, WOCCI '09.

[19]  Abeer Alwan,et al.  Analysis and Automatic Estimation of Children's Subglottal Resonances , 2011, INTERSPEECH.

[20]  Keikichi Hirose,et al.  Directional dependency of cepstrum on vocal tract length , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Elmar Nöth,et al.  Improvement of a speech recognizer for standardized medical assessment of children's speech by integration of prior knowledge , 2010, 2010 IEEE Spoken Language Technology Workshop.

[22]  F. Frome,et al.  Talking back to big bird: Preschool users and a simple speech recognition system , 1993 .

[23]  Li Deng,et al.  Efficient and Robust Language Modeling in an Automatic Children's Reading Tutor System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[24]  Shrikanth S. Narayanan,et al.  Creating conversational interfaces for children , 2002, IEEE Trans. Speech Audio Process..

[25]  Shrikanth S. Narayanan,et al.  Robust recognition of children's speech , 2003, IEEE Trans. Speech Audio Process..

[26]  Vidhyasaharan Sethu,et al.  Speaker variability in emotion recognition - an adaptation based approach , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Joakim Gustafson,et al.  Voice transformations for improving children²s speech recognition in a publicly available dialogue system , 2002, INTERSPEECH.

[28]  Björn W. Schuller,et al.  Active Learning by Sparse Instance Tracking and Classifier Confidence in Acoustic Emotion Recognition , 2012, INTERSPEECH.

[29]  Michael Picheny,et al.  Improvements in children's speech recognition performance , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[30]  Jay G. Wilpon,et al.  A study of speech recognition for children and the elderly , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[31]  Bryan L. Pellom,et al.  Children's speech recognition with application to interactive books and tutors , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[32]  Sirko Molau,et al.  Normalization in the acoustic feature space for improved speech recognition , 2003 .

[33]  Shrikanth S. Narayanan,et al.  Automatic speech recognition for children , 1997, EUROSPEECH.

[34]  Yun Tang,et al.  Verifying Session Level Pronunciation Accuracy in a Speech Therapy Application , 2012, INTERSPEECH.

[35]  Daniel Elenius,et al.  Adaptation and normalization experiments in speech recognition for 4 to 8 year old children , 2005, INTERSPEECH.

[36]  Felix Claus,et al.  Herausforderungen an Sprachinterfaces für Kinder , 2010, ESSV.

[37]  Shweta Ghai,et al.  A Study on the Effect of Pitch on LPCC and PLPC Features for Children's ASR in Comparison to MFCC , 2011, INTERSPEECH.

[38]  Shrikanth S. Narayanan,et al.  Acoustics of children's speech: developmental changes of temporal and spectral parameters. , 1999, The Journal of the Acoustical Society of America.

[39]  M. Eskénazi KIDS: A database of children’s speech , 1996 .

[40]  Eduardo Lleida,et al.  Formant estimation in children's speech and its application for a Spanish speech therapy tool , 2009, SLaTE.

[41]  Daniel Elenius,et al.  The PF_STAR children's speech corpus , 2005, INTERSPEECH.

[42]  Rohit Sinha,et al.  Analyzing pitch robustness of PMVDR and MFCC features for children's speech recognition , 2010, 2010 International Conference on Signal Processing and Communications (SPCOM).

[43]  Christian Hacker,et al.  Automatic assessment of children speech to support language learning , 2009 .

[44]  Hugo Van hamme,et al.  Robust Tracking for Automatic Reading Tutors , 2012, INTERSPEECH.

[45]  Hugo Van hamme,et al.  Recording Speech of Children, Non-Natives and Elderly People for HLT Applications: the JASMIN-CGN Corpus , 2008, LREC.

[46]  Santiago Planet,et al.  Spontaneous children's emotion recognition by categorical classification of acoustic features , 2011, 6th Iberian Conference on Information Systems and Technologies (CISTI 2011).

[47]  Francisco Lacerda,et al.  Phonological complexity and vocabulary size in 30-month-old Swedish children , 2012, INTERSPEECH.

[48]  Li Deng,et al.  Automatic children's reading tutor on hand-held devices , 2008, INTERSPEECH.