SmartKom-English: From Robust Recognition to Felicitous Interaction

This chapter describes the English-language SmartKom-Mobile system and related research. We explain the work required to support a second language in SmartKom and the design of the English speech recognizer. We then discuss research carried out on signal processing methods for robust speech recognition and on language analysis using the Embodied Construction Grammar formalism. Finally, the results of human-subject experiments using a novel Wizard and Operator model are analyzed with an eye to creating more felicitous interaction in dialogue systems.

[1]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[2]  Hynek Hermansky,et al.  Temporal processing of speech in a time-feature space , 1997 .

[3]  Andreas Stolcke,et al.  The berkeley restaurant project , 1994, ICSLP.

[4]  Eugene V. Stakhiv,et al.  Empirical Studies , 2004, Administration and Policy in Mental Health and Mental Health Services Research.

[5]  Mitch Weintraub,et al.  Training issues and channel equalization techniques for the construction of telephone acoustic models using a high-quality speech corpus , 1994, IEEE Trans. Speech Audio Process..

[6]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[7]  Iryna Gurevych,et al.  In Context: Integrating Domain- and Situation-Specific Knowledge , 2006, SmartKom.

[8]  Chafic Mokbel,et al.  Deconvolution of telephone line effects for speech recognition , 1996, Speech Commun..

[9]  Sharon L. Oviatt,et al.  Adaptation of users² spoken dialogue patterns in a conversational interface , 2002, INTERSPEECH.

[10]  From Frames to Inference , 2002 .

[11]  Steve Renals,et al.  Efficient evaluation of the LVCSR search space using the NOWAY decoder , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[12]  George Lakoff,et al.  Women, Fire, and Dangerous Things , 1987 .

[13]  Hynek Hermansky,et al.  Beyond a single critical-band in TRAP based ASR , 2003, INTERSPEECH.

[14]  Roy Freedle,et al.  Cognitive and Linguistic: Analyses of Test Performance , 1987 .

[15]  Norman M. Fraser,et al.  Sublanguage, Register and Natural Language Interfaces , 1993, Interact. Comput..

[16]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[17]  Marilyn A. Walker,et al.  Towards developing general models of usability with PARADISE , 2000, Natural Language Engineering.

[18]  Ralf Engel,et al.  SPIN: language understanding for spoken dialogue systems using a production system approach , 2002, INTERSPEECH.

[19]  Michael Strube,et al.  An Iterative Data Collection Approach for Multimodal Dialogue Systems , 2002, LREC.

[20]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[21]  Michael Kleinschmidt,et al.  Robust speech recognition based on spectro-temporal processing , 2002 .

[22]  V. Yngve On getting a word in edgewise , 1970 .

[23]  Benjamin K. Bergen,et al.  Embodied Construction Grammar in Simulation-Based Language Understanding , 2003 .

[24]  Nicole Beringer,et al.  PROMISE - A Procedure for Multimodal Interactive System Evaluation , 2002 .

[25]  Eric K. Ringger,et al.  A Robust System for Natural Spoken Dialogue , 1996, ACL.

[26]  Khalid Choukri,et al.  SPEECHDAT-CAR. A Large Speech Database for Automotive Environments , 2000, LREC.

[27]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[28]  Hynek Hermansky,et al.  Qualcomm-ICSI-OGI features for ASR , 2002, INTERSPEECH.

[29]  Iryna Gurevych,et al.  Empirical Studies for Intuitive Interaction , 2006, SmartKom.

[30]  J. Schroeter,et al.  Speech and language processing for next-millennium communications services , 2000, Proceedings of the IEEE.

[31]  Miriam R. L. Petruck FRAME SEMANTICS , 1996 .

[32]  David Gelbart,et al.  Improving word accuracy with Gabor feature extraction , 2002, INTERSPEECH.

[33]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[34]  Nick Campbell,et al.  ISCA special session: hot topics in speech synthesis , 2003, INTERSPEECH.