Advances in Children's Speech Recognition within an Interactive Literacy Tutor

In this paper we present recent advances in acoustic and language modeling that improve recognition performance when children read out loud within digital books. First we extend previous work by incorporating cross-utterance word history information and dynamic n-gram language modeling. By additionally incorporating Vocal Tract Length Normalization (VTLN), Speaker-Adaptive Training (SAT) and iterative unsupervised structural maximum a posteriori linear regression (SMAPLR) adaptation we demonstrate a 54% reduction in word error rate. Next, we show how data from children's read-aloud sessions can be utilized to improve accuracy in a spontaneous story summarization task. An error reduction of 15% over previous published results is shown. Finally we describe a novel real-time implementation of our research system that incorporates time-adaptive acoustic and language modeling.

[1]  Bryan L. Pellom,et al.  Children's speech recognition with application to interactive books and tutors , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[2]  Jack Mostow,et al.  Predicting oral reading miscues , 2002, INTERSPEECH.

[3]  Hermann Ney,et al.  Improved methods for vocal tract normalization , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[5]  Ronald A. Cole,et al.  Perceptive animated interfaces: first steps toward a new paradigm for human-computer interaction , 2003, Proc. IEEE.

[6]  R. Cole,et al.  THE OGI KIDS’ SPEECH CORPUS AND RECOGNIZERS , 2000 .

[7]  Victor Zue,et al.  Multilingual human-computer interactions: from information access to language learning , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Kadri Hacioglu,et al.  Recent improvements in the CU Sonic ASR system for noisy speech: the SPINE task , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Jack Mostow,et al.  A Prototype Reading Coach that Listens , 1994, AAAI.

[10]  Chin-Hui Lee,et al.  Structural maximum a posteriori linear regression for fast HMM adaptation , 2002, Comput. Speech Lang..

[11]  Satanjeev Banerjee,et al.  Training a confidence measure for a reading tutor that listens , 2003, INTERSPEECH.

[12]  John H. L. Hansen,et al.  A new perspective on feature extraction for robust in-vehicle speech recognition , 2003, INTERSPEECH.

[13]  Albert T. Corbett,et al.  Evaluation of an Automated Reading Tutor That Listens: Comparison to Human Tutoring and Classroom Instruction , 2003 .