CASIS: a context-aware speech interface system

In this paper, we propose a robust natural language interface called CASIS for controlling devices in an intelligent environment. CASIS is novel in a sense that it integrates physical context acquired from the sensors embedded in the environment with traditionally used context to reduce the system error rate and disambiguate deictic references and elliptical inputs. The n-best result of the speech recognizer is re-ranked by a score calculated using a Bayesian network consisting of information from the input utterance and context. In our prototype system that uses device states, brightness, speaker location, chair occupancy, speech direction and action history as context, the system error rate has been reduced by 41% compared to a baseline system that does not leverage on context information.

[1]  Iryna Gurevych,et al.  Contextual Coherence in Natural Language Processing , 2003, CONTEXT.

[2]  Amanda Stent,et al.  The CommandTalk Spoken Dialogue System , 1999, ACL.

[3]  Alexander I. Rudnicky,et al.  Creating natural dialogs in the carnegie mellon communicator system , 1999, EUROSPEECH.

[4]  Roberto Pieraccini,et al.  A dynamic semantic model for re-scoring recognition hypotheses , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Mark Weiser,et al.  Some computer science issues in ubiquitous computing , 1999, MOCO.

[6]  Keith Cheverst,et al.  Developing a context-aware electronic tourist guide: some issues and experiences , 2000, CHI.

[7]  Roberto Pieraccini,et al.  A multimodal conversational interface for a concept vehicle , 2004, INTERSPEECH.

[8]  Sharon L. Oviatt,et al.  Breaking the Robustness Barrier: Recent Progress on the Design of Robust Multimodal Systems , 2002, Adv. Comput..

[9]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[10]  Wayne H. Ward,et al.  Estimating semantic confidence for spoken dialogue systems , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Li Deng,et al.  Challenges in adopting speech recognition , 2004, CACM.

[12]  Steven A. Shafer,et al.  XWand: UI for intelligent spaces , 2003, CHI '03.

[13]  Brendan J. Frey,et al.  Combination of statistical and rule-based approaches for spoken language understanding , 2002, INTERSPEECH.

[14]  Kent Larson,et al.  Activity Recognition in the Home Using Simple and Ubiquitous Sensors , 2004, Pervasive.

[15]  Andreas Krause,et al.  SenSay: a context-aware mobile phone , 2003, Seventh IEEE International Symposium on Wearable Computers, 2003. Proceedings..

[16]  Michael H. Coen,et al.  Speech modality: A Context Sensitive Natural Language Modality for an Intelligent Room , 1999 .

[17]  Stephanie Seneff,et al.  Response planning and generation in the MERCURY flight reservation system , 2002, Comput. Speech Lang..

[18]  Katashi Nagao,et al.  Ubiquitous Talker: Spoken Language Interaction with Real World Objects , 1995, IJCAI.

[19]  Anind K. Dey,et al.  Understanding and Using Context , 2001, Personal and Ubiquitous Computing.

[20]  Iryna Gurevych,et al.  Semantic Coherence Scoring Using an Ontology , 2003, NAACL.