Robustness optimization of a speech interface for child-directed embedded language tutoring

This contribution describes the robustness evaluation and optimization steps for a speech interface which is suitable for embedded language tutoring with special focus on children's speech. The baseline algorithms are derived from the pronunciation tutoring system AzAR directed to adult learners of German. The first prototype LiSA (2008) - directed to young children starting at 3 years - is currently evaluated and optimized, mainly addressing following issues: (a) the challenge of ASR-based pronunciation assessment for children's speech, (b) the handling of noise and reverberation in an embedded application scenario, and (c) the extraction of additional information such as age or gender. The article summarizes evaluation results of the speech recognizer in laboratory and real-world room environment.

[1]  R. Silverman,et al.  Vocabulary Development of English‐Language and English‐Only Learners in Kindergarten , 2007, The Elementary School Journal.

[2]  Rüdiger Hoffmann,et al.  Pronunciation Learning and Foreign Accent Reduction by an Audiovisual Feedback System , 2005, ACII.

[3]  Rüdiger Hoffmann,et al.  A new feature analysis method for robust ASR in reverberant environments based on the harmonic structure of speech , 2008, 2008 16th European Signal Processing Conference.

[4]  Elmar Nöth,et al.  Towards monitoring of children2s speech - a case study , 2008, WOCCI.

[5]  Ana Dembitz,et al.  Speech of children with cleft palate , 2010 .

[6]  Rüdiger Hoffmann,et al.  Towards an embedded language tutoring system for children , 2008, WOCCI.

[7]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[8]  Piero Cosi Recent advances in sonic Italian children2s speech recognition for interactive literacy tutors , 2008, WOCCI.

[9]  Elmar Nöth,et al.  PEAKS - A system for the automatic evaluation of voice and speech disorders , 2009, Speech Commun..

[10]  Rüdiger Hoffmann,et al.  Analysis of Verbal and Nonverbal Acoustic Signals with the Dresden UASR System , 2007, COST 2102 Workshop.

[11]  Ronald A. Cole,et al.  Advances in Children's Speech Recognition within an Interactive Literacy Tutor , 2004, HLT-NAACL.

[12]  Rüdiger Hoffmann,et al.  Codec integrated voice conversion for embedded speech synthesis , 2005, INTERSPEECH.

[13]  Rüdiger Hoffmann,et al.  The harming part of room acoustics in automatic speech recognition , 2007, INTERSPEECH.

[14]  Dominic W. Massaro,et al.  Animated speech: research progress and applications , 2001, AVSP.

[15]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.