Processing and fusioning multiple heterogeneous information sources in multimodal dialog systems

Context-aware dialog systems must be able to process very heterogeneous information sources and user input modes. In this paper we propose a method to fuse multimodal inputs into a unified representation. This representation allows the dialog manager of the system to find the best interaction strategy and also select the next system response. We show the applicability of our proposal by means of the implementation of a dialog system that considers spoken, tactile, and also information related to the context of the interaction with its users. Context information is related to the detection of user's intention during the dialog and their emotional state (internal context), and the user's location (external context).

[1]  Bruno Dumas Frameworks, description languages and fusion engines for multimodal interactive systems , 2010 .

[2]  Didier Stricker,et al.  Towards robust activity recognition for everyday life: Methods and evaluation , 2013, 2013 7th International Conference on Pervasive Computing Technologies for Healthcare and Workshops.

[3]  Roberto Pieraccini The Voice in the Machine: Building Computers That Understand Speech , 2012 .

[4]  Tim Polzehl,et al.  Emotion detection in dialog systems: Applications, strategies and challenges , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[5]  Indranil R. Bardhan,et al.  Health information technology and its impact on the quality and cost of healthcare delivery , 2013, Decis. Support Syst..

[6]  David Griol,et al.  A statistical simulation technique to develop and evaluate conversational agents , 2013, AI Commun..

[7]  Ramón López-Cózar,et al.  Influence of contextual information in emotion annotation for spoken dialogue systems , 2008, Speech Commun..

[8]  Victor Zue,et al.  Dialogue-Oriented Review Summary Generation for Spoken Dialogue Recommendation Systems , 2010, NAACL.

[9]  José Rouillard Web services and speech-based applications around VoiceXML , 2007, J. Networks.

[10]  Rahul C. Basole,et al.  Healthcare management through organizational simulation , 2013, Decis. Support Syst..

[11]  Katsumi Nihei,et al.  Context sharing platform , 2004 .

[12]  Schahram Dustdar,et al.  A survey on context-aware web service systems , 2009, Int. J. Web Inf. Syst..

[13]  Barry A. T. Brown,et al.  Building a Context Sensitive Telephone: Some Hopes and Pitfalls for Context Sensitive Computing , 2004, Computer Supported Cooperative Work (CSCW).

[14]  Hakan Melin,et al.  CTT-bank: A speech controlled telephone banking system - an initial evaluation , 2007 .

[15]  Wolfgang Wahlster,et al.  SmartKom: Foundations of Multimodal Dialogue Systems (Cognitive Technologies) , 2006 .

[16]  Josephine Antoniou,et al.  Adaptive network-aided session support in context-aware converged mobile networks , 2012, Int. J. Auton. Adapt. Commun. Syst..

[17]  David Griol,et al.  A statistical approach to spoken dialog systems design and evaluation , 2008, Speech Commun..

[18]  James R. Glass,et al.  Exploiting Context Information in Spoken Dialogue Interaction with Mobile Devices ? , 2022 .

[19]  Teruko Mitamura,et al.  Context-aware Dialog Strategies for Multimodal Mobile Dialog Systems , 2006 .

[20]  David Griol,et al.  Bringing context-aware access to the web through spoken interaction , 2013, Applied Intelligence.

[21]  Roberto Pieraccini,et al.  Automating spoken dialogue management design using machine learning: An industry perspective , 2008, Speech Commun..

[22]  Ramón López-Cózar,et al.  Relations between de-facto criteria in the evaluation of a spoken dialogue system , 2008, Speech Commun..

[23]  Victor Zue,et al.  Multilingual spoken-language understanding in the MIT Voyager system , 1995, Speech Commun..