Visual Modeling of Dynamic Gestures Using 3D Appearance and Motion Features

We present a novel 3-D gesture recognition scheme that combines the 3-D appearance of the hand and the motion dynamics of the gesture to classify manipulative and controlling gestures. Our method does not directly track the hand. Instead, we take an object-centered approach that efficiently computes 3-D appearance using a region-based coarse stereo matching algorithm. Motion cues are captured by differentiating the appearance feature. An unsupervised learning scheme is carried out to capture the cluster structure of these features. Then, the image sequence of a gesture is converted to a series of symbols that indicate the cluster identities of each image pair. Two schemes, i.e., forward HMMs and neural networks, are used to model the dynamics of the gestures. We implemented a real-time system and performed gesture recognition experiments to analyze the performance with different combinations of the appearance and motion features. The system achieves recognition accuracy of over 96% using both the appearance and motion cues.

[1]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[2]  Ying Wu,et al.  Visual panel: virtual mouse, keyboard and 3D controller with an ordinary piece of paper , 2001, PUI '01.

[3]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[4]  Yoichi Sato,et al.  Real-Time Fingertip Tracking and Gesture Recognition , 2002, IEEE Computer Graphics and Applications.

[5]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Gregory D. Hager,et al.  Appearance-based Visual Interaction , 2002 .

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[9]  Darius Burschka,et al.  VICs: A modular HCI framework using spatiotemporal dynamics , 2004, Machine Vision and Applications.

[10]  Ying Wu,et al.  Hand modeling, analysis and recognition , 2001, IEEE Signal Process. Mag..

[11]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[13]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[14]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[15]  Bernt Schiele,et al.  Comprehensive Colour Image Normalization , 1998, ECCV.

[16]  Aditya Ramamoorthy,et al.  Recognition of dynamic hand gestures , 2003, Pattern Recognit..

[17]  Darius Burschka,et al.  VICs: A Modular Vision-Based HCI Framework , 2003, ICVS.

[18]  Andries van Dam,et al.  Post-WIMP user interfaces , 1997, CACM.

[19]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[20]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[21]  François Bérard,et al.  Bare-hand human-computer interaction , 2001, PUI '01.

[22]  Rama Chellappa,et al.  View invariants for human action recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[23]  Aaron F. Bobick,et al.  A State-Based Approach to the Representation and Recognition of Gesture , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Michael G. Strintzis,et al.  A gesture recognition system using 3D data , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[25]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[27]  James W. Davis,et al.  Action Recognition Using Temporal Templates , 1997 .

[28]  Arnold W. M. Smeulders,et al.  Color Based Object Recognition , 1997, ICIAP.

[29]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[30]  Darius Burschka,et al.  The 4D Touchpad: Unencumbered HCI With VICs , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[31]  Francis K. H. Quek Unencumbered Gestural Interaction , 1996, IEEE Multim..