Multimodal Interfaces: A Survey of Principles, Models and Frameworks

The grand challenge of multimodal interface creation is to build reliable processing systems able to analyze and understand multiple communication means in real-time. This opens a number of associated issues covered by this chapter, such as heterogeneous data types fusion, architectures for real-time processing, dialog management, machine learning for multimodal interaction, modeling languages, frameworks, etc. This chapter does not intend to cover exhaustively all the issues related to multimodal interfaces creation and some hot topics, such as error handling, have been left aside. The chapter starts with the features and advantages associated with multimodal interaction, with a focus on particular findings and guidelines, as well as cognitive foundations underlying multimodal interaction. The chapter then focuses on the driving theoretical principles, time-sensitive software architectures and multimodal fusion and fission issues. Modeling of multimodal interaction as well as tools allowing rapid creation of multimodal interfaces are then presented. The article concludes with an outline of the current state of multimodal interaction research in Switzerland, and also summarizes the major future challenges in the field.

[1]  Miroslav Melichar,et al.  From vocal to multimodal dialogue management , 2006, ICMI '06.

[2]  Scott R. Klemmer,et al.  Papier-Mache: toolkit support for tangible input , 2004, CHI.

[3]  C D Wickens,et al.  Compatibility and Resource Competition between Modalities of Input, Central Processing, and Output , 1983, Human factors.

[4]  Andrei Popescu-Belis,et al.  Graphical representation of meetings on mobile devices , 2008, Mobile HCI.

[5]  Shumin Zhai,et al.  Manual and gaze input cascaded (MAGIC) pointing , 1999, CHI '99.

[6]  Minh Tue Vo,et al.  Building an application framework for speech and pen input integration in multimodal learning interfaces , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Thierry Ganille,et al.  ICARE software components for rapidly developing multimodal interfaces , 2004, ICMI '04.

[8]  D. McNeill Hand and Mind: What Gestures Reveal about Thought , 1992 .

[9]  Denis Lalanne,et al.  Strengths and weaknesses of software architectures for the rapid creation of tangible and multimodal interfaces , 2008, Tangible and Embedded Interaction.

[10]  Philippe A. Palanque,et al.  Human-Computer Interaction – INTERACT 2007 , 2007, Lecture Notes in Computer Science.

[11]  Eduard H. Hovy,et al.  On the Knowledge Underlying Multimedia Presentations , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[12]  Martin Rajman,et al.  Minimizing modality bias when exploring input preferences for multimodal systems in new domains: the archivus case study , 2007, CHI Extended Abstracts.

[13]  Christian D. Schunn,et al.  Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction , 2002, Proc. IEEE.

[14]  Benjamin Michotte,et al.  A transformational approach for multimodal web user interfaces based on UsiXML , 2005, ICMI '05.

[15]  Agnes Lisowska,et al.  Multimodal interface design for the multimodal meeting domain: Preliminary indications from a query , 2003 .

[16]  Masahiro Araki,et al.  Multimodal Dialog Description Language for Rapid System Development , 2006, SIGDIAL Workshop.

[17]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[18]  Stuart C. Shapiro,et al.  Intelligent multi-media interface technology , 1991 .

[19]  J. Sweller,et al.  Reducing cognitive load by mixing auditory and visual presentation modes , 1995 .

[20]  David Traum,et al.  The Information State Approach to Dialogue Management , 2003 .

[21]  A BoltRichard,et al.  Put-that-there , 1980 .

[22]  Ann Blandford,et al.  Four easy pieces for assessing the usability of multimodal interaction: the CARE properties , 1995, INTERACT.

[23]  Martin Rajman,et al.  Archivus: A Multimodal System for Multimedia Meeting Browsing and Retrieval , 2006, ACL.

[24]  Ronnie W. Smith,et al.  Current and New Directions in Discourse and Dialogue , 2004 .

[25]  Trung Bui,et al.  Multimodal Dialogue Management - State of the art , 2006 .

[26]  Allen Newell,et al.  The psychology of human-computer interaction , 1983 .

[27]  Sharon L. Oviatt,et al.  Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions , 2000, Hum. Comput. Interact..

[28]  Agnes Lisowska,et al.  Multimodal interface design for multimodal meeting content retrieval , 2004 .

[29]  Maurizio Rigamonti,et al.  An Ego-Centric and Tangible Approach to Meeting Indexing and Browsing , 2007, MLMI.

[30]  Tsuneo Nitta,et al.  XISL: a language for describing multimodal interaction scenarios , 2003, ICMI '03.

[31]  Kathleen McKeown,et al.  Text generation: using discourse strategies and focus constraints to generate natural language text , 1985 .

[32]  C. Raymond Perrault,et al.  Analyzing Intention in Utterances , 1986, Artif. Intell..

[33]  Saul Greenberg,et al.  Phidgets: easy development of physical interfaces through physical widgets , 2001, UIST '01.

[34]  Colin Potts,et al.  Design of Everyday Things , 1988 .

[35]  J. Jacko,et al.  The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications , 2002 .

[36]  Joëlle Coutaz,et al.  A design space for multimodal systems: concurrent processing and data fusion , 1993, INTERCHI.

[37]  Jean-Yves Lionel Lawson,et al.  The openinterface framework: a tool for multimodal interaction. , 2008, CHI Extended Abstracts.

[38]  Vladimir Pavlovic,et al.  Toward multimodal human-computer interface , 1998, Proc. IEEE.

[39]  Thomas S. Huang,et al.  Exploiting the dependencies in information fusion , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[40]  Mark T. Maybury,et al.  Intelligent multimedia interfaces , 1994, CHI Conference Companion.

[41]  Sharon L. Oviatt,et al.  Advances in Robust Multimodal Interface Design , 2003, IEEE Computer Graphics and Applications.

[42]  Sharon L. Oviatt,et al.  Human-centered design meets cognitive load theory: designing interfaces that help people think , 2006, MM '06.

[43]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[44]  Denis Lalanne,et al.  Prototyping Multimodal Interfaces with the SMUIML Modeling Language , 2008 .

[45]  Christopher D. Wickens,et al.  Multiple resources and performance prediction , 2002 .

[46]  Andrei Popescu-Belis,et al.  TQB: Accessing Multimodal Data Using a Transcript-based Query and Browsing Interface , 2006, LREC.

[47]  Mohammed Yeasin,et al.  A real-time framework for natural multimodal interaction with large screen displays , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[48]  Clive Souter,et al.  Dialogue Management Systems: a Survey and Overview , 1997 .

[49]  Sharon K Tindall-Ford,et al.  When two sensory modes are better than one , 1997 .

[50]  Marie-Luce Bourguet,et al.  A Toolkit for Creating and Testing Multimodal Interface Designs , 2002 .

[51]  Stéphane Marchand-Maillet,et al.  The IM2 Multimodal Meeting Browser Family , 2005 .

[52]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[53]  Sharon Oviatt,et al.  Multimodal interactive maps: designing for human performance , 1997 .

[54]  P. Chandler,et al.  Cognitive load as a factor in the structuring of technical material. , 1990 .

[55]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[56]  Matthew Turk,et al.  Perceptual user interfaces (introduction) , 2000, CACM.

[57]  Ivan Marsic,et al.  A framework for rapid development of multimodal interfaces , 2003, ICMI '03.

[58]  A. Cawsey Book Reviews: Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context , 1995, CL.

[59]  Andrei Popescu-Belis,et al.  Machine Learning for Multimodal Interaction , 4th International Workshop, MLMI 2007, Brno, Czech Republic, June 28-30, 2007, Revised Selected Papers , 2008, MLMI.

[60]  Kristinn R. Thórisson,et al.  Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[61]  Matthew Turk,et al.  Perceptual user interfaces , 2000 .

[62]  David G. Novick,et al.  Mutual Beliefs of Multiple Conversants: A Computational Model of Collaboration in Air Traffic Control , 1993, AAAI.

[63]  Sharon L. Oviatt,et al.  Toward a theory of organized multimodal integration patterns during human-computer interaction , 2003, ICMI '03.

[64]  Sharon L. Oviatt,et al.  From members to teams to committee-a robust approach to gestural and multimodal recognition , 2002, IEEE Trans. Neural Networks.

[65]  R. Cole,et al.  Survey of the State of the Art in Human Language Technology , 2010 .

[66]  James A. Larson,et al.  Guidelines for multimodal user interface design , 2004, CACM.

[67]  Agnes Lisowska Masson Multimodal interface design for multimodal meeting content retrieval , 2004, ICMI '04.

[68]  M. D’Esposito Working memory. , 2008, Handbook of clinical neurology.

[69]  Catherine Pelachaud,et al.  Audio-visual and multimodal speech-based systems , 2000 .

[70]  Sharon Oviatt,et al.  Multimodal Interfaces , 2008, Encyclopedia of Multimedia.

[71]  Maurizio Rigamonti,et al.  FaericWorld: Browsing Multimedia Events Through Static Documents and Links , 2007, INTERACT.

[72]  Philip R. Cohen,et al.  Unification-based multimodal integration , 1997 .

[73]  Roger K. Moore,et al.  Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation , 2000 .

[74]  Johanna D. Moore,et al.  Proceedings of the Conference on Human Factors in Computing Systems , 1989 .

[75]  David E. Kieras,et al.  An Overview of the EPIC Architecture for Cognition and Performance With Application to Human-Computer Interaction , 1997, Hum. Comput. Interact..

[76]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[77]  Sharon L. Oviatt,et al.  Multimodal Integration - A Statistical View , 1999, IEEE Trans. Multim..

[78]  R. Mayer,et al.  A Split-Attention Effect in Multimedia Learning: Evidence for Dual Processing Systems in Working Memory , 1998 .

[79]  Steven Greenberg,et al.  Speech intelligibility derived from asynchronous processing of auditory-visual information , 2001, AVSP.

[80]  Eric D. Petajan Automatic lipreading to enhance speech recognition , 1984 .

[81]  Phil Cohen,et al.  Dialogue modeling , 1997 .