Toward multimodal human-computer interface

Recent advances in various signal processing technologies, coupled with an explosion in the available computing power, have given rise to a number of novel human-computer interaction (HCI) modalities: speech, vision-based gesture recognition, eye tracking, electroencephalograph, etc. Successful embodiment of these modalities into an interface has the potential of easing the HCI bottleneck that has become noticeable with the advances in computing and communication. It has also become increasingly evident that the difficulties encountered in the analysis and interpretation of individual sensing modalities may be overcome by integrating them into a multimodal human-computer interface. We examine several promising directions toward achieving multimodal HCI. We consider some of the emerging novel input modalities for HCI and the fundamental issues in integrating them at various levels, from early signal level to intermediate feature level to late decision level. We discuss the different computational approaches that may be applied at the different levels of modality integration. We also briefly review several demonstrated multimodal HCI systems and applications. Despite all the recent developments, it is clear that further research is needed for interpreting and fitting multiple sensing modalities in the context of HCI. This research can benefit from many disparate fields of study that increase our understanding of the different human communication modalities and their potential role in HCI.

[1]  M. Bergamasco,et al.  Haptic interfaces: the study of force and tactile feedback systems , 1995, Proceedings 4th IEEE International Workshop on Robot and Human Communication.

[2]  Ronald Azuma,et al.  Tracking requirements for augmented reality , 1993, CACM.

[3]  Collin Wang,et al.  A virtual end-effector pointing system in point-and-direct robotics for inspection of surface flaws using a neural network based skeleton transform , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[4]  Alex Pentland,et al.  The ALIVE system: wireless, full-body interaction with autonomous agents , 1997, Multimedia Systems.

[5]  Geoffrey E. Hinton,et al.  Glove-Talk: a neural network interface between a data-glove and a speech synthesizer , 1993, IEEE Trans. Neural Networks.

[6]  Brad A. Myers,et al.  A brief history of human-computer interaction technology , 1998, INTR.

[7]  Liang Chen,et al.  QuickSet: Multimodal Interaction for Simulation Set-up and Control , 1997, ANLP.

[8]  David G. Stork,et al.  Visionary Speech: Looking Ahead to Practical Speechreading Systems , 1996 .

[9]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  T.F. Hutchinson Eye-gaze computer interfaces: computers that sense eye position on the display , 1993, Computer.

[11]  Z. Keirn,et al.  Man-machine communications through brain-wave processing , 1990, IEEE Engineering in Medicine and Biology Magazine.

[12]  Michel Beaudouin-Lafon,et al.  Charade: remote control of objects using free-hand gestures , 1993, CACM.

[13]  Vladimir Pavlovic,et al.  A Multimodal framework for Interacting with Virtual Environments , 1996 .

[14]  Rajeev Sharma,et al.  Computer vision based augmented reality for guiding and evaluating assembly sequences , 1998, Proceedings. IEEE 1998 Virtual Reality Annual International Symposium (Cat. No.98CB36180).

[15]  Robert Grover Brown,et al.  Introduction to random signal analysis and Kalman filtering , 1983 .

[16]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[17]  Richard M. Satava,et al.  Virtual Environments for Medical Training and Education , 1997, Presence: Teleoperators & Virtual Environments.

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  K. C. Chou,et al.  Multiscale recursive estimation, data fusion, and regularization , 1994, IEEE Trans. Autom. Control..

[20]  Minh Tue Vo,et al.  Building an application framework for speech and pen input integration in multimodal learning interfaces , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[21]  Jeffrey M. Bradshaw,et al.  An introduction to software agents , 1997 .

[22]  Yoav Shoham,et al.  An overview of agent-oriented programming , 1997 .

[23]  Alex Pentland,et al.  Invariant features for 3-D gesture recognition , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[24]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[25]  R. B. Knapp,et al.  Real-time computer control using pattern recognition of the electromyogram , 1993, Proceedings of the 15th Annual International Conference of the IEEE Engineering in Medicine and Biology Societ.

[26]  A BoltRichard,et al.  Put-that-there , 1980 .

[27]  J. Streeck Gesture as communication I: Its coordination with gaze and speech , 1993 .

[28]  Dennis J. McFarland,et al.  An EEG-based method for graded cursor control , 1993, Psychobiology.

[29]  Yasuhito Suenaga,et al.  "Finger-Pointer": Pointing interface by image processing , 1994, Comput. Graph..

[30]  David Zeltzer,et al.  A survey of glove-based input , 1994, IEEE Computer Graphics and Applications.

[31]  Pramod K. Varshney,et al.  Distributed Detection and Data Fusion , 1996 .

[32]  Francis K. H. Quek Eyes in the interface , 1995, Image Vis. Comput..

[33]  Vladimir Pavlovic,et al.  Gestural interface to a visual computing environment for molecular biologists , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[34]  R.J.K. Jacob,et al.  Hot topics-eye-gaze computer interfaces: what you look at is what you get , 1993, Computer.

[35]  Jennifer Healey,et al.  Affective wearables , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[36]  Gloria L. Calhoun,et al.  Principles and guidelines for the design of eye/voice interaction dialogs , 1996, Proceedings Third Annual Symposium on Human Interaction with Complex Systems. HICS'96.

[37]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[38]  Thomas Elbert,et al.  Self-Regulation of The Brain and Behavior , 1984 .

[39]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  F. Raab,et al.  Magnetic Position and Orientation Tracking System , 1979, IEEE Transactions on Aerospace and Electronic Systems.

[41]  Greg Turk,et al.  Interactive simulation in a multi-person virtual world , 1992, CHI.

[42]  Biing-Hwang Juang,et al.  Speech recognition in adverse environments , 1991 .

[43]  Rudolf Kober,et al.  Fusion of visual and acoustic signals for command-word recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44]  Rajeev Sharma,et al.  Computer Vision-Based Augmented Reality for Guiding Manual Assembly , 1997, Presence: Teleoperators & Virtual Environments.

[45]  Vladimir Pavlovic,et al.  Integration of audio/visual information for use in human-computer intelligent interaction , 1997, Proceedings of International Conference on Image Processing.

[46]  Vladimir Pavlovic,et al.  Speech/gesture interface to a visual computing environment for molecular biologists , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[47]  Li Deng,et al.  Non-Stationary Hidden Markov Models for Speech Recognition , 1996 .

[48]  Elizabeth A. Krupinski,et al.  Recording and analyzing eye-position data using a microcomputer workstation , 1992 .

[49]  J. A. Adam,et al.  Virtual reality is for real , 1993 .

[50]  Sangkyu Park,et al.  Multimodal user interfaces in the Open Agent Architecture , 1997, IUI '97.

[51]  Paul Duchnowski,et al.  Adaptive bimodal sensor fusion for automatic speechreading , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[52]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[53]  Steve Mann,et al.  Wearable Computing: A First Step Toward Personal Imaging , 1997, Computer.

[54]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[55]  Laxmikant V. Kale,et al.  MDScope - a visual computing environment for structural biology , 1995 .

[56]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[57]  D. Bell,et al.  Evidence Theory and Its Applications , 1991 .

[58]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[59]  James Llinas,et al.  An introduction to multisensor data fusion , 1997, Proc. IEEE.

[60]  F. Lehmann,et al.  Semantic Networks in Artificial Intelligence , 1992 .

[61]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[62]  D. L. Quam,et al.  Gesture recognition with a DataGlove , 1990, IEEE Conference on Aerospace and Electronics.

[63]  Belur V. Dasarathy,et al.  Sensor fusion potential exploitation-innovative architectures and illustrative applications , 1997, Proc. IEEE.

[64]  N. Negroponte Agents: from direct manipulation to delegation , 1997 .

[65]  Howard Rheingold,et al.  Virtual Reality , 1991 .

[66]  N. P. Reddy,et al.  EMG-Based Interface for Position Tracking and Control in VR Environments and Teleoperation , 1997, Presence: Teleoperators & Virtual Environments.

[67]  Ali Adjoudani,et al.  Audio-visual speech recognition compared across two architectures , 1995, EUROSPEECH.

[68]  Alexander G. Hauptmann,et al.  Gestures with Speech for Graphic Manipulation , 1993, Int. J. Man Mach. Stud..

[69]  Guanrong Chen,et al.  Kalman Filtering with Real-time Applications , 1987 .

[70]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[71]  D. McNeill Hand and Mind: What Gestures Reveal about Thought , 1992 .

[72]  Cagatay Basdogan,et al.  Surgical Simulation: An Emerging Technology for Training in Emergency Medicine , 1997, Presence: Teleoperators & Virtual Environments.

[73]  Marvin Minsky,et al.  A framework for representing knowledge" in the psychology of computer vision , 1975 .

[74]  Robin R. Murphy,et al.  Biological and cognitive foundations of intelligent sensor fusion , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[75]  Joëlle Coutaz,et al.  Applying the Wizard of Oz Technique to the Study of Multimodal Systems , 1993, EWHCI.

[76]  R. Benjamin Knapp,et al.  Controlling computers with neural signals. , 1996 .

[77]  Jian Wang,et al.  Integration of eye-gaze, voice and manual response in multimodal user interface , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[78]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..