Exploiting Predictable Response Training to Improve Automatic Recognition of Children's Spoken Responses

The unpredictability of spoken responses by young children (6-7 years old) makes them problematic for automatic speech recognizers. Aist and Mostow proposed predictable response training to improve automatic recognition of children's free-form spoken responses. We apply this approach in the context of Project LISTEN's Reading Tutor to the task of teaching children an important reading comprehension strategy, namely to make up their own questions about text while reading it. We show how to use knowledge about strategy instruction and the story text to generate a language model that predicts questions spoken by children during comprehension instruction. We evaluated this model on a previously unseen test set of 18 utterances totaling 137 words spoken by 11 second grade children in response to prompts the Reading Tutor inserted as they read. Compared to using a baseline trigram language model that does not incorporate this knowledge, speech recognition using the generated language model achieved concept recall 5 times higher – so much that the difference was statistically significant despite small sample size.

[1]  D. Langenberg Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction , 2000 .

[2]  E. W. Dolch,et al.  A Basic Sight Vocabulary , 1936, The Elementary School Journal.

[3]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[4]  Albert T. Corbett,et al.  Mining Free-form Spoken Responses to Tutor Prompts , 2008, EDM.

[5]  Barbara Schneider,et al.  Scale-up in education , 2007 .

[6]  B. Rosenshine,et al.  Teaching Students to Generate Questions: A Review of the Intervention Studies , 1996 .

[7]  Anthony Jameson,et al.  Interpreting symptoms of cognitive load in speech input , 1999 .

[8]  Alexander G. Hauptmann,et al.  Improving acoustic models with captioned multimedia speech , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[9]  Shrikanth S. Narayanan,et al.  A review of the acoustic and linguistic properties of children's speech , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[10]  Shrikanth S. Narayanan,et al.  Acoustic analysis and automatic recognition of spontaneous children²s speech , 2006, INTERSPEECH.

[11]  P. David Pearson,et al.  Effective Practices for Developing Reading Comprehension , 2009 .

[12]  Martin J. Russell,et al.  Challenges for computer recognition of children2s speech , 2007, SLaTE.

[13]  Ronald A. Cole,et al.  Advances in Children's Speech Recognition within an Interactive Literacy Tutor , 2004, HLT-NAACL.

[14]  W. Lewis Johnson,et al.  Improving the authoring of foreign language interactive lessons in the tactical language training system , 2007, SLaTE.

[15]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[16]  I. Hirsh,et al.  Development of speech sounds in children. , 1969, Acta oto-laryngologica. Supplementum.

[17]  Jack Mostow,et al.  Generating Instruction Automatically for the Reading Strategy of Self-Questioning , 2009, AIED.

[18]  Diane J. Litman,et al.  ITSPOKE: An Intelligent Tutoring Spoken Dialogue System , 2004, NAACL.

[19]  J. Beck,et al.  When the Rubber Meets the Road : Lessons from the In-School Adventures of an Automated Reading Tutor That Listens 1 , 2003 .

[20]  Jack Mostow,et al.  Designing spoken tutorial dialogue with children to elicit predictable but educationally valuable responses , 2009, INTERSPEECH.

[21]  Bonnie J. F. Meyer,et al.  Design and Pilot of a Web-Based Intelligent Tutoring System to Improve Reading Comprehension in Middle School Students , 2006 .