Fusion engines for multimodal input: a survey

Fusion engines are fundamental components of multimodal inter-active systems, to interpret input streams whose meaning can vary according to the context, task, user and time. Other surveys have considered multimodal interactive systems; we focus more closely on the design, specification, construction and evaluation of fusion engines. We first introduce some terminology and set out the major challenges that fusion engines propose to solve. A history of past work in the field of fusion engines is then presented using the BRETAM model. These approaches to fusion are then classified. The classification considers the types of application, the fusion principles and the temporal aspects. Finally, the challenges for future work in the field of fusion engines are set out. These include software frameworks, quantitative evaluation, machine learning and adaptation.

[1]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[2]  Rainer Stiefelhagen,et al.  Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures , 2004, ICMI '04.

[3]  Kristinn R. Thórisson,et al.  Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[4]  Harry Bunt,et al.  Multimodal Human-Computer Communication , 1995, Lecture Notes in Computer Science.

[5]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[6]  Wolfgang Wahlster,et al.  User and discourse models for multimodal communication , 1991 .

[7]  Marc Erich Latoschik Designing transition networks for multimodal VR-interactions using a markup language , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[8]  Fang Chen,et al.  A novel method for multi-sensory data fusion in multimodal human computer interaction , 2006, OZCHI.

[9]  Joëlle Coutaz,et al.  A design space for multimodal systems: concurrent processing and data fusion , 1993, INTERCHI.

[10]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[11]  Martin R. Gibbs,et al.  Mediating intimacy: designing technologies to support strong-tie relationships , 2005, CHI.

[12]  Dimitrios Tzovaras Multimodal user interfaces : from signals to interaction , 2008 .

[13]  Mark Steedman Information and syntax in spoken language systems , 1989 .

[14]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[15]  Niels Henze,et al.  Gesture recognition with a Wii controller , 2008, TEI.

[16]  Jean-Claude Martin,et al.  Developing Multimodal Interfaces: A Theoretical Framework and Guided Propagation Networks , 1995, Multimodal Human-Computer Communication.

[17]  Philippe A. Palanque,et al.  A Formal Approach for User Interaction Reconfiguration of Safety Critical Interactive Systems , 2008, SAFECOMP.

[18]  J. Gabriel Amores,et al.  Multimodal fusion: a new hybrid strategy for dialogue systems , 2006, ICMI '06.

[19]  Ann Blandford,et al.  Four easy pieces for assessing the usability of multimodal interaction: the CARE properties , 1995, INTERACT.

[20]  Thierry Ganille,et al.  ICARE software components for rapidly developing multimodal interfaces , 2004, ICMI '04.

[21]  Laurence Nigay,et al.  ICARE: a component-based approach for the design and development of multimodal interfaces , 2004, CHI EA '04.

[22]  Marco Winckler,et al.  A Formal Description of Multimodal Interaction Techniques for Immersive Virtual Reality Applications , 2005, INTERACT.

[23]  Marie-Luce Bourguet,et al.  A Toolkit for Creating and Testing Multimodal Interface Designs , 2002 .

[24]  Miroslav Melichar,et al.  From vocal to multimodal dialogue management , 2006, ICMI '06.

[25]  Denis Lalanne,et al.  Strengths and weaknesses of software architectures for the rapid creation of tangible and multimodal interfaces , 2008, Tangible and Embedded Interaction.

[26]  Michael Johnston,et al.  Finite-state multimodal integration and understanding , 2005, Natural Language Engineering.

[27]  Jakob Nielsen,et al.  A Virtual Protocol Model for Computer-Human Interaction , 1984, Int. J. Man Mach. Stud..

[28]  Ivan Marsic,et al.  A framework for rapid development of multimodal interfaces , 2003, ICMI '03.

[29]  Mohammed Yeasin,et al.  A real-time framework for natural multimodal interaction with large screen displays , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[30]  A. D. Milota,et al.  Modality fusion for graphic design applications , 2004, ICMI '04.

[31]  T. S. Shikler,et al.  Design Challenges in Multi Modal Inference Systems for Human Computer Interaction , 2004 .

[32]  Luís Carriço,et al.  A conceptual framework for developing adaptive multimodal applications , 2006, IUI '06.

[33]  Sharon L. Oviatt,et al.  Multimodal Interfaces: A Survey of Principles, Models and Frameworks , 2009, Human Machine Interaction.

[34]  C. Y. Thielman,et al.  Natural Language with Integrated Deictic and Graphic Gestures , 1989, HLT.

[35]  Laurence Nigay,et al.  A Framework for the Combination and Characterization of Output Modalities , 2000, DSV-IS.

[36]  A BoltRichard,et al.  Put-that-there , 1980 .

[37]  Joëlle Coutaz,et al.  A generic platform for addressing the multimodal challenge , 1995, CHI '95.

[38]  Marcos Serrano,et al.  Software Engineering for Multimodal Interactive Systems , 2008 .

[39]  Norbert Pfleger,et al.  Context based multimodal fusion , 2004, ICMI '04.

[40]  Yacine Bellik,et al.  Media integration in multimodal interfaces , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[41]  Brian R. Gaines,et al.  Modeling and forecasting the information sciences , 1991, Inf. Sci..

[42]  Laurence Nigay,et al.  Output Multimodal Interaction: The Case of Augmented Surgery , 2007 .