Understanding Human Behaviors Based on Eye-Head-Hand Coordination

Action recognition has traditionally focused on processing fixed camera observations while ignoring non-visual information. In this paper, we explore the dynamic properties of the movements of different body parts in natural tasks: eye, head and hand movements are quite tightly coupled with the ongoing task. In light of this, our method takes an agent-centered view and incorporates an extensive description of eye-head-hand coordination. With the ability to track the course of gaze and head movements, our approach uses gaze and head cues to detect agent-centered attention switches that can then be utilized to segment an action sequence into action units. Based on recognizing those action primitives, parallel hidden Markov models are applied to model and integrate the probabilistic sequences of the action units of different body parts. An experimental system is built for recognizing human behaviors in three natural tasks: "unscrewing a jar", "stapling a letter" and "pouring water", which demonstrates the effectiveness of the approach.

[1]  Hervé Bourlard,et al.  Subband-based speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Yasuo Kuniyoshi,et al.  Qualitative Recognition of Ongoing Human Action Sequences , 1993, IJCAI.

[3]  M. Hayhoe Vision Using Routines: A Functional Account of Vision , 2000 .

[4]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[5]  Chen Yu,et al.  Learning to recognize human action sequences , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.

[6]  Dimitris N. Metaxas,et al.  Parallel hidden Markov models for American sign language recognition , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  Allan D. Jepson,et al.  The Computational Perception of Scene Dynamics , 1997, Comput. Vis. Image Underst..

[8]  Allan D. Jepson,et al.  Computational Perception of Scene Dynamics , 1996, ECCV.

[9]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  M. Land,et al.  The Roles of Vision and Eye Movements in the Control of Activities of Daily Living , 1998, Perception.

[11]  Aaron F. Bobick,et al.  Learning visual behavior for gesture analysis , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[12]  Matthew Brand,et al.  The "Inverse Hollywood Problem": From Video to Scripts and Storyboards via Causal Analysis , 1997, AAAI/IAAI.

[13]  Rolf Adams,et al.  Seeded Region Growing , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[16]  M. Hayhoe,et al.  In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.