An analysis of the causes of increased error rates in children²s speech recognition

Previous studies have shown that children’s speech is more difficult to recognize by machine than adults’ speech. This paper presents the results of experiments which investigate recognition performance variation within a small population of children. Results suggest that recogniser performance on a child’s speech is well correlated with a teacher’s assessment of the child’s speaking proficiency. For children whose speech is judged to be good, performance is close to that of adults, but error rates increase by a factor of 4 for children with ‘poor’ speech. An analysis of actual pronunciations for children with poor speech shows significant divergence from the ‘idealised’ baseforms in a pronunciation dictionary. It is demonstrated that some improvements can be gained through the use of customized dictionaries. Finally, the effects of bandwidth reduction on recogniser performance are investigated for a range of children with differing speaking styles.

[1]  Martin J. Russell,et al.  Why is automatic recognition of children's speech difficult? , 2001, INTERSPEECH.

[2]  Shrikanth S. Narayanan,et al.  Creating conversational interfaces for children , 2002, IEEE Trans. Speech Audio Process..

[3]  Jay G. Wilpon,et al.  A study of speech recognition for children and the elderly , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Martin J. Russell,et al.  The STAR system: an interactive pronunciation tutor for young children , 2000, Comput. Speech Lang..

[5]  Joseph B. Kruskal,et al.  Time Warps, String Edits, and Macromolecules , 1999 .

[6]  Li Lee,et al.  Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Steve Young,et al.  The HTK book , 1995 .

[8]  Michael Picheny,et al.  Improvements in children's speech recognition performance , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Jack Mostow,et al.  A Prototype Reading Coach that Listens , 1994, AAAI.

[10]  Philip N. Garner,et al.  Using formant frequencies in speech recognition , 1997, EUROSPEECH.