Dynamic bayesian networks for information fusion with applications to human-computer interfaces

Recent advances in various display and virtual technologies coupled with an explosion in available computing power have given rise to a number of novel human{computer interaction (HCI) modalities{ speech, vision-based gesture recognition, eye tracking, EEG, etc. However, despite the abundance of novel interaction devices, the naturalness and e ciency of HCI has remained low. This is due in particular to the lack of robust sensory data interpretation techniques. To deal with the task of interpreting single and multiple interaction modalities this dissertation establishes a novel probabilistic approach based on dynamic Bayesian networks (DBNs). As a generalization of the successful hidden Markov models, DBNs are a natural basis for the general temporal action interpretation task. The problem of interpretation of single or multiple interacting modalities can then be viewed as a Bayesian inference task. In this work three complex DBN models are introduced: mixtures of DBNs, mixed-state DBNs, and coupled HMMs. In-depth study of these models yields e cient approximate inference and parameter learning techniques applicable to a wide variety of problems. Experimental validation of the proposed approaches in the domains of gesture and speech recognition con rms the model's applicability to both unimodal and multimodal interpretation tasks.

[1]  R. Benjamin Knapp,et al.  Controlling computers with neural signals. , 1996 .

[2]  Alex Pentland,et al.  Invariant features for 3-D gesture recognition , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[3]  A. Pentland,et al.  Attention-driven Expression and Gesture Analysis in an Interactive Environment , 1995 .

[4]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[5]  Tosiyasu L. Kunii,et al.  Model-based analysis of hand posture , 1995, IEEE Computer Graphics and Applications.

[6]  Cagatay Basdogan,et al.  Surgical Simulation: An Emerging Technology for Training in Emergency Medicine , 1997, Presence: Teleoperators & Virtual Environments.

[7]  S. Ahmad,et al.  A usable real-time 3D hand tracker , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[8]  Justine Cassell,et al.  Recovering the temporal structure of natural gesture , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[9]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[10]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[11]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[12]  Beth Levy,et al.  Conceptual Representations in Lan-guage Activity and Gesture , 1980 .

[13]  Emmanuelle Clergue Automatic face and gestual recognition for video indexing , 1995 .

[14]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[15]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[16]  Gloria L. Calhoun,et al.  Principles and guidelines for the design of eye/voice interaction dialogs , 1996, Proceedings Third Annual Symposium on Human Interaction with Complex Systems. HICS'96.

[17]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Tosiyasu L. Kunii,et al.  Constraint-Based Hand Animation , 1993 .

[19]  William T. Freeman,et al.  Television control by hand gestures , 1994 .

[20]  Robert J. K. Jacob,et al.  What you look at is what you get: Using eye movements as computer input , 1993 .

[21]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[22]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[23]  Christoph Maggioni,et al.  A novel gestural input device for virtual reality , 1993, Proceedings of IEEE Virtual Reality Annual International Symposium.

[24]  Christopher R. Wren,et al.  Real-Time 3-D Tracking of the Human Body , 1996 .

[25]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[26]  Vladimir Pavlovic,et al.  Gestural interface to a visual computing environment for molecular biologists , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[27]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .

[28]  S. Etlinger,et al.  Self-regulation of the brain and behavior , 1986 .

[29]  Geoffrey E. Hinton,et al.  Glove-Talk: a neural network interface between a data-glove and a speech synthesizer , 1993, IEEE Trans. Neural Networks.

[30]  Brad A. Myers,et al.  A brief history of human-computer interaction technology , 1998, INTR.

[31]  David G. Stork,et al.  Visionary Speech: Looking Ahead to Practical Speechreading Systems , 1996 .

[32]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  James W. Davis,et al.  Real-time recognition of activity using temporal templates , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[34]  C. Creider Hand and Mind: What Gestures Reveal about Thought , 1994 .

[35]  Philip R. Cohen,et al.  Synergistic use of direct manipulation and natural language , 1989, CHI '89.

[36]  N. P. Reddy,et al.  EMG-Based Interface for Position Tracking and Control in VR Environments and Teleoperation , 1997, Presence: Teleoperators & Virtual Environments.

[37]  David McNeill,et al.  Speech, Gesture, and Discourse. , 1992 .

[38]  Ali Adjoudani,et al.  Audio-visual speech recognition compared across two architectures , 1995, EUROSPEECH.

[39]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[40]  Joëlle Coutaz,et al.  A design space for multimodal systems: concurrent processing and data fusion , 1993, INTERCHI.

[41]  William T. Freeman,et al.  Orientation Histograms for Hand Gesture Recognition , 1995 .

[42]  H. Rauch Solutions to the linear smoothing problem , 1963 .

[43]  Mubarak Shah,et al.  Recognizing Hand Gestures , 1994, ECCV.

[44]  Yuntao Cui,et al.  Learning-based hand sign recognition using SHOSLIF-M , 1995, Proceedings of IEEE International Conference on Computer Vision.

[45]  Y. Bar-Shalom Tracking and data association , 1988 .

[46]  D. Fraser,et al.  The optimum linear smoother as a combination of two optimum linear filters , 1969 .

[47]  D. L. Quam,et al.  Gesture recognition with a DataGlove , 1990, IEEE Conference on Aerospace and Electronics.

[48]  Timothy F. Cootes,et al.  Automatic interpretation of human faces and hand gestures using flexible models. , 1995 .

[49]  Roberto Cipolla,et al.  Robust structure from motion using motion parallax , 1993, 1993 (4th) International Conference on Computer Vision.

[50]  Robin R. Murphy,et al.  Biological and cognitive foundations of intelligent sensor fusion , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[51]  Francis K. H. Quek Eyes in the interface , 1995, Image Vis. Comput..

[52]  Naonori Ueda,et al.  Deterministic Annealing Variant of the EM Algorithm , 1994, NIPS.

[53]  John R. Kender,et al.  Finding skin in color images , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[54]  John B. Moore,et al.  Hidden Markov Models: Estimation and Control , 1994 .

[55]  M. Bergamasco,et al.  Haptic interfaces: the study of force and tactile feedback systems , 1995, Proceedings 4th IEEE International Workshop on Robot and Human Communication.

[56]  Ronald Azuma,et al.  Tracking requirements for augmented reality , 1993, CACM.

[57]  Michel Beaudouin-Lafon,et al.  Charade: remote control of objects using free-hand gestures , 1993, CACM.

[58]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[59]  Collin Wang,et al.  A virtual end-effector pointing system in point-and-direct robotics for inspection of surface flaws using a neural network based skeleton transform , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[60]  Alex Pentland,et al.  The ALIVE system: wireless, full-body interaction with autonomous agents , 1997, Multimedia Systems.

[61]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[63]  Andrew M. Junker,et al.  Loop-closure of the visual-cortical response , 1988, Proceedings of the IEEE 1988 National Aerospace and Electronics Conference.

[64]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[65]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[66]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[67]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[68]  Edward Hunter,et al.  Vision based hand gesture interpretation using recursive estimation , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[69]  Richard M. Satava,et al.  Virtual Environments for Medical Training and Education , 1997, Presence: Teleoperators & Virtual Environments.

[70]  Ramesh C. Jain,et al.  Recursive identification of gesture inputs using hidden Markov models , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[71]  R. B. Knapp,et al.  Real-time computer control using pattern recognition of the electromyogram , 1993, Proceedings of the 15th Annual International Conference of the IEEE Engineering in Medicine and Biology Societ.

[72]  Larry S. Davis,et al.  Towards 3-D model-based tracking and recognition of human movement: a multi-view approach , 1995 .

[73]  Francis K. H. Quek,et al.  Toward a vision-based hand gesture interface , 1994 .

[74]  Thomas S. Huang,et al.  Vision based hand modeling and tracking for virtual teleconferencing and telecollaboration , 1995, Proceedings of IEEE International Conference on Computer Vision.

[75]  Greg Turk,et al.  Interactive simulation in a multi-person virtual world , 1992, CHI.

[76]  Biing-Hwang Juang,et al.  Speech recognition in adverse environments , 1991 .

[77]  Rajeev Sharma,et al.  Computer Vision-Based Augmented Reality for Guiding Manual Assembly , 1997, Presence: Teleoperators & Virtual Environments.

[78]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[79]  James L. Crowley,et al.  Finger Tracking as an Input Device for Augmented Reality , 1995 .

[80]  Minh Tue Vo,et al.  Building an application framework for speech and pen input integration in multimodal learning interfaces , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[81]  David Zeltzer,et al.  A survey of glove-based input , 1994, IEEE Computer Graphics and Applications.

[82]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[83]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[84]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[85]  James Llinas,et al.  An introduction to multisensor data fusion , 1997, Proc. IEEE.

[86]  Roberto Cipolla,et al.  Human-robot interface by pointing with uncalibrated stereo vision , 1996, Image Vis. Comput..

[87]  A. Lecours,et al.  The Biological foundations of gestures : motor and semiotic aspects , 1986 .

[88]  M. A. Wincek Applied Statistical Time Series Analysis , 1990 .

[89]  Yasuhito Suenaga,et al.  "Finger-Pointer": Pointing interface by image processing , 1994, Comput. Graph..

[90]  Steve Mann,et al.  Wearable Computing: A First Step Toward Personal Imaging , 1997, Computer.

[91]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[92]  F. Raab,et al.  Magnetic Position and Orientation Tracking System , 1979, IEEE Transactions on Aerospace and Electronic Systems.

[93]  Y. Bar-Shalom,et al.  Low observable target motion analysis using amplitude information , 1995, Proceedings of 1995 American Control Conference - ACC'95.

[94]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[95]  Eric Bauer,et al.  Update Rules for Parameter Estimation in Bayesian Networks , 1997, UAI.

[96]  Geoffrey E. Hinton,et al.  Switching State-Space Models , 1996 .