Multimodal interfaces: Challenges and perspectives

The development of interfaces has been a technology-driven process. However, the newly developed multimodal interfaces are using recognition-based technologies that must interpret human-speech, gesture, gaze, movement patterns, and other behavioral cues. As a result, the interface design requires a human-centered approach. In this paper we review the major approaches to multimodal Human Computer Interaction, giving an overview of the user and task modeling, and of the multimodal fusion. We highlight the challenges, open issues, and the future trends in multimodal interfaces research.

[1]  Niels Ole Bernsen Defining a taxonomy of output modalities from an HCI perspective , 1997, Comput. Stand. Interfaces.

[2]  Richard A. Volz,et al.  Evaluation of a Haptic Mixed Reality System for Interactions with a Virtual Control Panel , 2005, Presence: Teleoperators & Virtual Environments.

[3]  Matthew Turk,et al.  Gesture Recognition in Handbook of Virtual Environment Technology , 2001 .

[4]  Ivan Marsic,et al.  A framework for rapid development of multimodal interfaces , 2003, ICMI '03.

[5]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Mark T. Maybury,et al.  Intelligent multimedia interfaces , 1994, CHI Conference Companion.

[7]  Anthony Jameson,et al.  Making systems sensitive to the user's time and working memory constraints , 1998, IUI '99.

[8]  Qiang Ji,et al.  Special issue: eye detection and tracking , 2005, Comput. Vis. Image Underst..

[9]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[10]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[11]  Hiroshi Ishii,et al.  Bricks: laying the foundations for graspable user interfaces , 1995, CHI '95.

[12]  Juergen Luettin,et al.  Audio-Visual Automatic Speech Recognition: An Overview , 2004 .

[13]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[14]  Margrit Betke,et al.  Communication via eye blinks and eyebrow raises: video-based human-computer interfaces , 2003, Universal Access in the Information Society.

[15]  Ephraim P. Glinert,et al.  Multimodal Integration , 1996, IEEE Multim..

[16]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[17]  Nicu Sebe,et al.  Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Adam Cheyer,et al.  The Open Agent Architecture , 1997, Autonomous Agents and Multi-Agent Systems.

[19]  Sharon L. Oviatt,et al.  Mutual disambiguation of recognition errors in a multimodel architecture , 1999, CHI '99.

[20]  Alisa Rudnitskaya,et al.  Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie , 2005 .

[21]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[22]  Nicu Sebe,et al.  Guest Editors' Introduction: Human-Centered Computing--Toward a Human Revolution , 2007, Computer.

[23]  Philip R. Cohen,et al.  MULTIMODAL INTERFACES THAT PROCESS WHAT COMES NATURALLY , 2000 .

[24]  Vladimir Pavlovic,et al.  Real-Time Vision for Human-Computer Interaction , 2010 .

[25]  Philip R. Cohen,et al.  Tangible multimodal interfaces for safety-critical applications , 2004, CACM.

[26]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[27]  Ben Shneiderman,et al.  Direct manipulation for comprehensible, predictable and controllable user interfaces , 1997, IUI '97.

[28]  Ted Selker,et al.  Visual Attentive Interfaces , 2004 .

[29]  Michael Strube,et al.  Architecture and implementation of multimodal plug and play , 2003, ICMI '03.

[30]  Hidekazu Yoshikawa Modeling humans in human-computer interaction , 2002 .

[31]  Abderrahmane Kheddar,et al.  Tactile interfaces: a state-of-the-art survey , 2004 .

[32]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[33]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[34]  Alejandro Jaimes Human-centered multimedia: culture, deployment, and access , 2006, IEEE Multimedia.

[35]  Veikko Surakka,et al.  Real-time estimation of emotional experiences from facial expressions , 2006, Interact. Comput..

[36]  Colin Potts,et al.  Design of Everyday Things , 1988 .

[37]  Markku Turunen,et al.  An architecture and applications for speech-based accessibility systems , 2005, IBM Syst. J..

[38]  Rajeev Sharma,et al.  Experimental evaluation of vision and speech based multimodal interfaces , 2001, PUI '01.

[39]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[40]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[41]  Dariu Gavrila,et al.  Looking at people , 2007, AVSS.

[42]  M. Turk,et al.  Perceptual Interfaces , 2003 .

[43]  Niels Ole Bernsen,et al.  Multimodality in Language and Speech Systems — From Theory to Design Support Tool , 2002 .

[44]  Douglas B. Moran,et al.  The Open Agent Architecture: A Framework for Building Distributed Software Systems , 1999, Appl. Artif. Intell..

[45]  Sameer Singh,et al.  Video analysis of human dynamics - a survey , 2003, Real Time Imaging.

[46]  Erik Hjelmås,et al.  Face Detection: A Survey , 2001, Comput. Vis. Image Underst..

[47]  Oliviero Stock,et al.  Multimodal intelligent information presentation , 2005 .

[48]  Nicu Sebe,et al.  Affective multimodal human-computer interaction , 2005, ACM Multimedia.

[49]  Thomas S. Huang,et al.  Exploiting the dependencies in information fusion , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[50]  Beat Fasel,et al.  Automati Fa ial Expression Analysis: A Survey , 1999 .

[51]  Thomas S. Huang,et al.  Human-Centered Computing : Toward a Human Revolution , 2007 .

[52]  Alex Pentland,et al.  Perceptual user interfaces: perceptual intelligence , 2000, CACM.

[53]  Gregory D. Abowd,et al.  Human-Computer Interaction (3rd Edition) , 2003 .

[54]  Ramesh C. Jain,et al.  Folk computing , 2002, CACM.

[55]  Margrit Betke,et al.  Evaluation of tracking methods for human-computer interaction , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[56]  Antti Oulasvirta,et al.  A cognitive meta-analysis of design approaches to interruptions in intelligent environments , 2004, CHI EA '04.

[57]  Allen Newell,et al.  The psychology of human-computer interaction , 1983 .

[58]  Qiang Ji,et al.  Real-Time Eye, Gaze, and Face Pose Tracking for Monitoring Driver Vigilance , 2002, Real Time Imaging.

[59]  James W. Davis,et al.  GESTURE RECOGNITION , 2023, International Research Journal of Modernization in Engineering Technology and Science.

[60]  Philip R. Cohen,et al.  Towards a fault-tolerant multi-agent system architecture , 2000, AGENTS '00.

[61]  Andry Rakotonirainy,et al.  A Survey of Research on Context-Aware Homes , 2003, ACSW.

[62]  Ben Shneiderman,et al.  Leonardo's laptop: human needs and the new computing technologies , 2005, CIKM '05.

[63]  Kirk P. Arnett,et al.  Productivity gains via an adaptive user interface: an empirical analysis , 1994, Int. J. Hum. Comput. Stud..

[64]  Sharon Oviatt,et al.  Multimodal Interfaces , 2008, Encyclopedia of Multimedia.

[65]  Marko Balabanovic,et al.  Exploring Versus Exploiting when Learning User Models for Text Recommendation , 2004, User Modeling and User-Adapted Interaction.

[66]  James A. Larson,et al.  Guidelines for multimodal user interface design , 2004, CACM.

[67]  Mark Weiser,et al.  Some computer science issues in ubiquitous computing , 1993, CACM.

[68]  Trevor Darrell,et al.  MULTIMODAL INTERFACES THAT Flex, Adapt, and Persist , 2004 .

[69]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[70]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[71]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[72]  Pat Langley,et al.  User modeling in adaptive interfaces , 1999 .

[73]  Sharon L. Oviatt,et al.  Individual differences in multimodal integration patterns: what are they and why do they exist? , 2005, CHI.

[74]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[75]  Andrew T Duchowski,et al.  A breadth-first survey of eye-tracking applications , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[76]  Sharon Oviatt,et al.  User-centered modeling and evaluation of multimodal interfaces , 2003, Proc. IEEE.

[77]  Christian D. Schunn,et al.  Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction , 2002, Proc. IEEE.

[78]  Marco Porta,et al.  Vision-based user interfaces: methods and applications , 2002, Int. J. Hum. Comput. Stud..

[79]  Z. Obrenovic,et al.  Modeling multimodal human-computer interaction , 2004, Computer.

[80]  Sharon L. Oviatt,et al.  Perceptual user interfaces: multimodal interfaces that process what comes naturally , 2000, CACM.