Multimodal speech and audio user interfaces for K-12 outreach

Elementary school children have short attention spans. This paper describes three multimodal speech and audio user interfaces that captured and held the attention of a few dozen elementary-school and high-school children during the course of a two-day university open house. The Speech Recognition Game demonstrated an isolated word recognizer with a rapidly-won game, in which children were challenged to get ten words in a row correctly recognized. The Audio Easter Egg Hunt demonstrated our timeliner multimedia analytics platform with a faster-than-real-time search through orchestral music for audio anomalies (cuckoo clocks, motorcycles, etc). Finally, at the Intonation Station, children had to pick the pitch contour that would help a friendly troll to successfully hunt dragons in the city of Champaign. Results suggest that competition, collaboration, and other forms of social interaction may motivate children more than prizes.

[1]  Thomas S. Huang,et al.  Real-world acoustic event detection , 2010, Pattern Recognit. Lett..

[2]  Mark Hasegawa-Johnson,et al.  Universal access: speech recognition for talkers with spastic dysarthria , 2009, INTERSPEECH.

[3]  Thomas S. Huang,et al.  Feature analysis and selection for acoustic event detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Horacio Franco,et al.  Automatic detection of phone-level mispronunciation for language learning , 1999, EUROSPEECH.

[5]  Yoon Kim,et al.  Automatic pronunciation scoring of specific phone segments for language instruction , 1997, EUROSPEECH.

[6]  A. Chickering,et al.  Seven Principles for Good Practice in Undergraduate Education , 1987, CORE.

[7]  Maxine Eskénazi,et al.  Detection of foreign speakers' pronunciation errors for second language training-preliminary results , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Mark Hasegawa-Johnson,et al.  State-Transition Interpolation and MAP Adaptation for HMM-based Dysarthric Speech Recognition , 2010, SLPAT@NAACL.

[9]  Kazunori Ozawa,et al.  English speech training using voice conversion , 1990, ICSLP.

[10]  Jeung-Yoon Choi,et al.  Prosody dependent speech recognition on radio news corpus of American English , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Steve J. Young,et al.  Language learning based on non-native speech recognition , 1997, EUROSPEECH.

[12]  Tomasz Letowski,et al.  Detection and Localization of Magazine Insertion Clicks in Various Environmental Noises , 2007 .

[13]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[14]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[15]  Thomas S. Huang,et al.  Improving faster-than-real-time human acoustic event detection by saliency-maximized audio visualization , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).