Multimodal Human Computer Interaction: A Survey

In this paper, we review the major approaches to multimodal human-computer interaction, giving an overview of the field from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition and emotion in audio). We discuss user and task modeling, and multimodal fusion, highlighting challenges, open issues, and emerging applications for multimodal human-computer interaction (MMHCI) research.

[1]  Arun Ross,et al.  Information fusion in biometrics , 2003, Pattern Recognit. Lett..

[2]  Thierry Pun,et al.  Design and Evaluation of Multimodal System for the Non-visual Exploration of Digital Pictures , 2003, INTERACT.

[3]  Marko Balabanovic,et al.  Exploring Versus Exploiting when Learning User Models for Text Recommendation , 2004, User Modeling and User-Adapted Interaction.

[4]  Katie Salen,et al.  Rules of play: game design fundamentals , 2003 .

[5]  P. Lang The emotion probe. Studies of motivation and attention. , 1995, The American psychologist.

[6]  Paul P. Maglio,et al.  A robust algorithm for reading detection , 2001, PUI '01.

[7]  Susan R. Fussell,et al.  Gestures Over Video Streams to Support Remote Collaboration on Physical Tasks , 2004, Hum. Comput. Interact..

[8]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[9]  Anthony Jameson,et al.  Making systems sensitive to the user's time and working memory constraints , 1998, IUI '99.

[10]  Matthew Turk,et al.  Multimodal Human-Computer Interaction , 2005 .

[11]  A. Adjoudani,et al.  On the Integration of Auditory and Visual Parameters in an HMM-based ASR , 1996 .

[12]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[13]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[14]  Chun Chen,et al.  Audio-visual based emotion recognition - a new approach , 2004, CVPR 2004.

[15]  Kostas Karpouzis,et al.  Emotion Analysis in Man-Machine Interaction Systems , 2004, MLMI.

[16]  Michael Strube,et al.  Architecture and implementation of multimodal plug and play , 2003, ICMI '03.

[17]  Mubarak Shah,et al.  Determining driver visual attention with one camera , 2003, IEEE Trans. Intell. Transp. Syst..

[18]  Christine L. Lisetti,et al.  Modeling Multimodal Expression of User’s Affective Subjective Experience , 2002, User Modeling and User-Adapted Interaction.

[19]  Jianyi Liu,et al.  Hotspot Components for Gesture-Based Interaction , 2005, INTERACT.

[20]  Hidekazu Yoshikawa Modeling humans in human-computer interaction , 2002 .

[21]  Steven M. Seitz,et al.  Techniques for interactive audience participation , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[22]  Abderrahmane Kheddar,et al.  Tactile interfaces: a state-of-the-art survey , 2004 .

[23]  Philip R. Cohen,et al.  The role of voice in human-machine communication , 1994 .

[24]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[25]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Niels Ole Bernsen,et al.  Foundations of Multimodal Representations: A Taxonomy of Representational Modalities , 1994, Interact. Comput..

[27]  Shumin Zhai,et al.  Conversing with the user based on eye-gaze patterns , 2005, CHI.

[28]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[29]  Nicu Sebe,et al.  Emotion Recognition Based on Joint Visual and Audio Cues , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[30]  Allan Collins,et al.  Assessment and technology , 1993, CACM.

[31]  Pat Langley,et al.  User modeling in adaptive interfaces , 1999 .

[32]  Sharon L. Oviatt,et al.  Individual differences in multimodal integration patterns: what are they and why do they exist? , 2005, CHI.

[33]  Larry S. Davis,et al.  Recognizing Human Facial Expressions From Long Image Sequences Using Optical Flow , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Michael Johnston,et al.  Multimodal Applications from Mobile to Kiosk , 2004 .

[35]  Matthew Turk,et al.  Perceptual user interfaces (introduction) , 2000, CACM.

[36]  Jay G. Wilpon,et al.  Voice communication between humans and machines , 1994 .

[37]  Alex Pentland Socially Aware Computation and Communication , 2005, Computer.

[38]  P. Ekman Emotion in the human face , 1982 .

[39]  Patrick Henry Winston,et al.  The psychology of computer vision , 1976, Pattern Recognit..

[40]  Christian D. Schunn,et al.  Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction , 2002, Proc. IEEE.

[41]  Marco Porta,et al.  Vision-based user interfaces: methods and applications , 2002, Int. J. Hum. Comput. Stud..

[42]  Andrew T Duchowski,et al.  A breadth-first survey of eye-tracking applications , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[43]  Sharon L. Oviatt,et al.  Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions , 2000, Hum. Comput. Interact..

[44]  Nicu Sebe,et al.  Guest Editors' Introduction: Human-Centered Computing--Toward a Human Revolution , 2007, Computer.

[45]  Nicu Sebe,et al.  MULTIMODAL EMOTION RECOGNITION , 2005 .

[46]  Peter Robinson,et al.  Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[47]  Roel Vertegaal,et al.  Attentive User Interfaces , 2003 .

[48]  Ying Wu,et al.  Vision-Based Gesture Recognition: A Review , 1999, Gesture Workshop.

[49]  Nicu Sebe,et al.  Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Joëlle Coutaz,et al.  A design space for multimodal systems: concurrent processing and data fusion , 1993, INTERCHI.

[51]  James A. Larson,et al.  Guidelines for multimodal user interface design , 2004, CACM.

[52]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[53]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[54]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[55]  Björn Granström,et al.  Multimodality in Language and Speech Systems , 2002 .

[56]  Mathias Kölsch,et al.  Emerging Topics in Computer Vision , 2004 .

[57]  Qiang Ji,et al.  Real-Time Eye, Gaze, and Face Pose Tracking for Monitoring Driver Vigilance , 2002, Real Time Imaging.

[58]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[59]  Jeffrey S. Shell,et al.  Augmenting and sharing memory with eyeBlog , 2004, CARPE'04.

[60]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[61]  Douglas B. Moran,et al.  The Open Agent Architecture: A Framework for Building Distributed Software Systems , 1999, Appl. Artif. Intell..

[62]  Mubarak Shah,et al.  Ontology and taxonomy collaborated framework for meeting classification , 2004, ICPR 2004.

[63]  Alex Pentland,et al.  LAFTER: a real-time face and lips tracker with facial expression recognition , 2000, Pattern Recognit..

[64]  Rómer Rosales,et al.  Learning Body Pose via Specialized Maps , 2001, NIPS.

[65]  Sameer Singh,et al.  Video analysis of human dynamics - a survey , 2003, Real Time Imaging.

[66]  A. Mehrabian Communication without words , 1968 .

[67]  Erik Hjelmås,et al.  Face Detection: A Survey , 2001, Comput. Vis. Image Underst..

[68]  D. McNeill Hand and Mind: What Gestures Reveal about Thought , 1992 .

[69]  Nicu Sebe,et al.  Affective Meeting Video Analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[70]  Sharon L. Oviatt,et al.  Mutual disambiguation of recognition errors in a multimodel architecture , 1999, CHI '99.

[71]  Ephraim P. Glinert,et al.  Multimodal Integration , 1996, IEEE Multim..

[72]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[73]  Kent Larson,et al.  A living laboratory for the design and evaluation of ubiquitous computing technologies , 2005, CHI Extended Abstracts.

[74]  Andry Rakotonirainy,et al.  A Survey of Research on Context-Aware Homes , 2003, ACSW.

[75]  Ben Shneiderman,et al.  Leonardo's laptop: human needs and the new computing technologies , 2005, CIKM '05.

[76]  Hiroshi Ishii,et al.  Bricks: laying the foundations for graspable user interfaces , 1995, CHI '95.

[77]  Juergen Luettin,et al.  Audio-Visual Automatic Speech Recognition: An Overview , 2004 .

[78]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[79]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[80]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[81]  Bob Carpenter,et al.  The logic of typed feature structures , 1992 .

[82]  N. Ambady,et al.  Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[83]  James L. Flanagan,et al.  Multimodal interaction on PDA's integrating speech and pen inputs , 2003, INTERSPEECH.

[84]  Alisa Rudnitskaya,et al.  Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie , 2005 .

[85]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[86]  Mohammed Yeasin,et al.  Speech-gesture driven multimodal interfaces for crisis management , 2003, Proc. IEEE.

[87]  Jake K. Aggarwal,et al.  Human motion analysis: a review , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[88]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-c , 2000 .

[89]  Björn W. Schuller,et al.  Multimodal emotion recognition in audiovisual communication , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[90]  Alejandro Jaimes Human-centered multimedia: culture, deployment, and access , 2006, IEEE Multimedia.

[91]  James M. Rehg,et al.  Vision for a smart kiosk , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[92]  Rajeev Sharma,et al.  Understanding Gestures in Multimodal Human Computer Interaction , 2000, Int. J. Artif. Intell. Tools.

[93]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[94]  Kirk P. Arnett,et al.  Productivity gains via an adaptive user interface: an empirical analysis , 1994, Int. J. Hum. Comput. Stud..

[95]  Margrit Betke,et al.  Communication via eye blinks and eyebrow raises: video-based human-computer interfaces , 2003, Universal Access in the Information Society.

[96]  Sharon Oviatt,et al.  Multimodal Interfaces , 2008, Encyclopedia of Multimedia.

[97]  Emile H. L. Aarts,et al.  Ambient intelligence: a multimedia perspective , 2004, IEEE MultiMedia.

[98]  Jeffrey S. Shell,et al.  Hands on cooking: towards an attentive kitchen , 2003, CHI Extended Abstracts.

[99]  Z. Obrenovic,et al.  Modeling multimodal human-computer interaction , 2004, Computer.

[100]  Thomas S. Huang,et al.  Exploiting the dependencies in information fusion , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[101]  Jeff B. Pelz Portable eyetracking in natural behavior , 2004 .

[102]  Beat Fasel,et al.  Automatic facial expression analysis: a survey , 2003, Pattern Recognit..

[103]  Stephen A. Brewster,et al.  Multimodal 'eyes-free' interaction techniques for wearable devices , 2003, CHI '03.

[104]  Jun Ohya,et al.  Recognizing multiple persons' facial expressions using HMM based on automatic extraction of significant frames from image sequences , 1997, Proceedings of International Conference on Image Processing.

[105]  Nicu Sebe,et al.  Affective multimodal human-computer interaction , 2005, ACM Multimedia.

[106]  Illah R. Nourbakhsh,et al.  A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[107]  Marvin Minsky,et al.  A framework for representing knowledge" in the psychology of computer vision , 1975 .

[108]  J. Jacko,et al.  The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications , 2002 .

[109]  Stan Sclaroff,et al.  Automatic 2D Hand Tracking in Video Sequences , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[110]  Yasuyuki Kono,et al.  Real World Objects as Media for Augmenting Human Memory , 2003 .

[111]  Harry Wechsler,et al.  Using Eye Region Biometrics to Reveal Affective and Cognitive States , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[112]  Robert J. K. Jacob,et al.  Evaluation of eye gaze interaction , 2000, CHI.

[113]  Rosalind W. Picard Affective Computing , 1997 .

[114]  Adam Cheyer,et al.  MVIEWS: multimodal tools for the video analyst , 1998, IUI '98.

[115]  Douglas DeCarlo,et al.  Robust clustering of eye movement recordings for quantification of visual interest , 2004, ETRA.

[116]  Dirk Heylen,et al.  Multimodal Communication in Inhabited Virtual Environments , 2002, Int. J. Speech Technol..

[117]  Philip R. Cohen,et al.  MULTIMODAL INTERFACES THAT PROCESS WHAT COMES NATURALLY , 2000 .

[118]  Shyamsundar Rajaram,et al.  Human Activity Recognition Using Multidimensional Indexing , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[119]  Rajeev Sharma,et al.  Experimental evaluation of vision and speech based multimodal interfaces , 2001, PUI '01.

[120]  Philip R. Cohen,et al.  Tangible multimodal interfaces for safety-critical applications , 2004, CACM.

[121]  Dana H. Ballard,et al.  A multimodal learning interface for grounding spoken language in sensory perceptions , 2004, ACM Trans. Appl. Percept..

[122]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[123]  Ying Wu,et al.  Hand modeling, analysis and recognition , 2001, IEEE Signal Process. Mag..

[124]  Vladimir Pavlovic,et al.  Boosted learning in dynamic Bayesian networks for multimodal speaker detection , 2003, Proc. IEEE.

[125]  Matthew Turk,et al.  Computer vision in the interface , 2004, CACM.

[126]  Yasunari Yoshitomi,et al.  Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[127]  Dariu Gavrila,et al.  Looking at people , 2007, AVSS.

[128]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[129]  Niels Ole Bernsen,et al.  Multimodality in Language and Speech Systems — From Theory to Design Support Tool , 2002 .

[130]  Jakob Nielsen,et al.  Noncommand user interfaces , 1993, CACM.

[131]  Mohan M. Trivedi,et al.  Occupant posture analysis with stereo and thermal infrared video: algorithms and experimental evaluation , 2004, IEEE Transactions on Vehicular Technology.

[132]  Philip R. Cohen,et al.  Towards a fault-tolerant multi-agent system architecture , 2000, AGENTS '00.

[133]  Daniel Gatica-Perez,et al.  Analyzing Group Interactions in Conversations: a Review , 2006, 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[134]  Jing Xiao,et al.  Meticulously detailed eye region model and its application to analysis of facial images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[135]  Sing Bing Kang,et al.  Emerging Topics in Computer Vision , 2004 .

[136]  Antti Oulasvirta,et al.  A cognitive meta-analysis of design approaches to interruptions in intelligent environments , 2004, CHI EA '04.

[137]  P R Cohen,et al.  The role of voice input for human-machine communication. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[138]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[139]  Tieniu Tan,et al.  Recent developments in human motion analysis , 2003, Pattern Recognit..

[140]  Allen Newell,et al.  The psychology of human-computer interaction , 1983 .

[141]  Niels Ole Bernsen Defining a taxonomy of output modalities from an HCI perspective , 1997, Comput. Stand. Interfaces.

[142]  Michael J. Lyons,et al.  Designing, Playing, and Performing with a Vision-based Mouth Interface , 2003, NIME.

[143]  Matthew Turk,et al.  Gesture Recognition in Handbook of Virtual Environment Technology , 2001 .

[144]  Ivan Marsic,et al.  A framework for rapid development of multimodal interfaces , 2003, ICMI '03.

[145]  Gregory D. Abowd,et al.  Perceptual user interfaces using vision-based eye tracking , 2003, ICMI '03.

[146]  Jian-Gang Wang,et al.  Eye gaze estimation from a single image of one eye , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[147]  Mark T. Maybury,et al.  Intelligent multimedia interfaces , 1994, CHI Conference Companion.

[148]  Sébastien Marcel,et al.  Gestures for Multi-Modal Interfaces: A Review , 2002 .

[149]  Rosalind W. Picard Affective computing: challenges , 2003, Int. J. Hum. Comput. Stud..

[150]  Paul F. M. J. Verschure,et al.  Live Soundscape Composition Based on Synthetic Emotions , 2003, IEEE Multim..

[151]  Richard Simpson,et al.  The smart wheelchair component system. , 2004, Journal of rehabilitation research and development.

[152]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[153]  Samy Bengio,et al.  Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[154]  Nicu Sebe,et al.  Human-centered computing: a multimedia perspective , 2006, MM '06.

[155]  Sharon L. Oviatt,et al.  Perceptual user interfaces: multimodal interfaces that process what comes naturally , 2000, CACM.

[156]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[157]  Yoshiaki Shirai,et al.  Look where you're going [robotic wheelchair] , 2003, IEEE Robotics Autom. Mag..

[158]  Oliviero Stock,et al.  Multimodal intelligent information presentation , 2005 .

[159]  L. C. De Silva,et al.  Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[160]  J. Lien,et al.  Automatic recognition of facial expressions using hidden markov models and estimation of expression intensity , 1998 .

[161]  Alex Pentland,et al.  Perceptual user interfaces: perceptual intelligence , 2000, CACM.

[162]  Kosuke Sato,et al.  Real-time gesture recognition by learning and selective control of visual interest points , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[163]  L. Paletta,et al.  Mobile Vision for Ambient Learning in Urban Environments , 2004 .

[164]  Jacob Eisenstein,et al.  Building the Design Studio of the Future , 2004, AAAI Technical Report.

[165]  Ramesh C. Jain,et al.  Folk computing , 2002, CACM.

[166]  Nicu Sebe,et al.  Machine Learning in Computer Vision , 2006, Computational Imaging and Vision.

[167]  Margrit Betke,et al.  Evaluation of tracking methods for human-computer interaction , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[168]  Ben Shneiderman,et al.  Direct manipulation for comprehensible, predictable and controllable user interfaces , 1997, IUI '97.

[169]  Joseph A. Paradiso,et al.  Optical Tracking for Music and Dance Performance , 1997 .

[170]  Ted Selker,et al.  Visual Attentive Interfaces , 2004 .

[171]  Datong Chen,et al.  Multimodal detection of human interaction events in a nursing home environment , 2004, ICMI '04.

[172]  Kristina Höök,et al.  Designing and evaluating intelligent user interfaces , 1998, IUI '99.

[173]  Mary P. Harper,et al.  VACE Multimodal Meeting Corpus , 2005, MLMI.

[174]  Larry S. Davis,et al.  Human expression recognition from motion using a radial basis function network architecture , 1996, IEEE Trans. Neural Networks.

[175]  Colin Potts,et al.  Design of Everyday Things , 1988 .

[176]  A. Martinez,et al.  Face image retrieval using HMMs , 1999, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'99).

[177]  Gang Hua,et al.  Tracking articulated body by dynamic Markov network , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[178]  Martin Kay,et al.  Functional Unification Grammar: A Formalism for Machine Translation , 1984, ACL.

[179]  Timothy F. Cootes,et al.  A unified approach to coding and interpreting face images , 1995, Proceedings of IEEE International Conference on Computer Vision.

[180]  Richard A. Volz,et al.  Evaluation of a Haptic Mixed Reality System for Interactions with a Virtual Control Panel , 2005, Presence: Teleoperators & Virtual Environments.

[181]  Flavia Sparacino,et al.  The Museum Wearable: real-time sensor-driven understanding of visitors' interests for personalized visually-augmented museum experiences , 2002 .

[182]  Jennifer Healey,et al.  Toward Machine Emotional Intelligence: Analysis of Affective Physiological State , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[183]  Sharon L. Oviatt,et al.  Taming recognition errors with a multimodal interface , 2000, CACM.

[184]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[185]  Zhihong Zeng,et al.  Bimodal HCI-related affect recognition , 2004, ICMI '04.

[186]  Hatice Gunes,et al.  Face and Body Gesture Recognition for a Vision-Based Multimodal Analyzer , 2004, VIP.

[187]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[188]  Mark Weiser,et al.  Some computer science issues in ubiquitous computing , 1993, CACM.

[189]  Trevor Darrell,et al.  MULTIMODAL INTERFACES THAT Flex, Adapt, and Persist , 2004 .