Evidence of phonological processes in automatic recognition of children's speech

Automatic speech recognition (ASR) for children’s speech is more difficult than for adults’ speech. A plausible explanation is that ASR errors are due to predictable phonological effects associated with language acquisition. We describe phone recognition experiments on hand labelled data for children aged between 5 and 9. A comparison of the resulting confusion matrices with those for adult speech (TIMIT) shows increased phone substitution rates for children, which correspond to some extent to established phonological phenomena. However these errors still only account for a relatively small proportion of the issue. This suggests that attempts to improve ASR accuracy on children’s speech by accommodating these phenomena, for example by changing the pronunciation dictionary, cannot solve the whole problem.

[1]  Shrikanth S. Narayanan,et al.  Improving speech recognition for children using acoustic adaptation and pronunciation modeling , 2014, WOCCI.

[2]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[3]  Shrikanth S. Narayanan,et al.  A review of the acoustic and linguistic properties of children's speech , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[4]  Isabel Trancoso,et al.  Correlating ASR errors with developmental changes in speech production: a study of 3-10-year-old European Portuguese children's speech , 2014, WOCCI.

[5]  C. Anderson,et al.  Identification of phonological processes in preschool children's single-word productions. , 2011, International journal of language & communication disorders.

[6]  B. Dodd,et al.  Phonological development: a normative study of British English‐speaking children , 2003, Clinical linguistics & phonetics.

[7]  Jay G. Wilpon,et al.  A study of speech recognition for children and the elderly , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  S. Mcleod,et al.  School-Aged Children’s Production of /s/ and /r/ Consonant Clusters , 2009, Folia Phoniatrica et Logopaedica.

[9]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[10]  Shrikanth S. Narayanan,et al.  Analyzing Children's Speech: An Acoustic Study of Consonants and Consonant-Vowel Transition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[12]  Shweta Ghai,et al.  Addressing pitch Mismatch for Children's Automatic Speech Recognition , 2011 .

[13]  Qun Li,et al.  An analysis of the causes of increased error rates in children²s speech recognition , 2002, INTERSPEECH.

[14]  Barbara Lust,et al.  Child Language: Acquisition and Growth , 2006 .

[15]  Shrikanth S. Narayanan,et al.  Robust recognition of children's speech , 2003, IEEE Trans. Speech Audio Process..

[16]  Thilo Pfau,et al.  A combination of speaker normalization and speech rate normalization for automatic speech recognition , 2000, INTERSPEECH.

[17]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[18]  A. B. Smit,et al.  The Iowa Articulation Norms Project and Its Nebraska Replication , 1990 .

[19]  Diego Giuliani,et al.  Investigating recognition of children's speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[20]  Shrikanth S. Narayanan,et al.  Acoustics of children's speech: developmental changes of temporal and spectral parameters. , 1999, The Journal of the Acoustical Society of America.

[21]  Shrikanth S. Narayanan,et al.  Creating conversational interfaces for children , 2002, IEEE Trans. Speech Audio Process..

[22]  Fabio Brugnara,et al.  Acoustic variability and automatic recognition of children's speech , 2007, Speech Commun..

[23]  Daniel Elenius,et al.  Comparing speech recognition for adults and children , 2004 .