A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation

Within the context of hand gesture recognition, spatiotemporal gesture segmentation is the task of determining, in a video sequence, where the gesturing hand is located and when the gesture starts and ends. Existing gesture recognition methods typically assume either known spatial segmentation or known temporal segmentation, or both. This paper introduces a unified framework for simultaneously performing spatial segmentation, temporal segmentation, and recognition. In the proposed framework, information flows both bottom-up and top-down. A gesture can be recognized even when the hand location is highly ambiguous and when information about when the gesture begins and ends is unavailable. Thus, the method can be applied to continuous image streams where gestures are performed in front of moving, cluttered backgrounds. The proposed method consists of three novel contributions: a spatiotemporal matching algorithm that can accommodate multiple candidate hand detections in every frame, a classifier-based pruning framework that enables accurate and early rejection of poor matches to gesture models, and a subgesture reasoning algorithm that learns which gesture models can falsely match parts of other longer gestures. The performance of the approach is evaluated on two challenging applications: recognition of hand-signed digits gestured by users wearing short-sleeved shirts, in front of a cluttered background, and retrieval of occurrences of signs of interest in a video database containing continuous, unsegmented signing in American sign language (ASL).

[1]  Richard Bowden,et al.  A boosted classifier tree for hand shape detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[2]  David J. Kriegman,et al.  A Real-Time Approach to the Spotting, Representation, and Recognition of Hand Gestures for Human-Computer Interaction , 2002, Comput. Vis. Image Underst..

[3]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[5]  Dimitris N. Metaxas,et al.  Parallel hidden Markov models for American sign language recognition , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 2004, International Journal of Computer Vision.

[7]  Hermann Hienz,et al.  Video-based continuous sign language recognition using statistical methods , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[8]  Björn Stenger,et al.  Filtering using a tree-based estimator , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Maribeth Gandy Coleman,et al.  The Gesture Pendant: A Self-illuminating, Wearable, Infrared Computer Vision System for Home Automation Control and Medical Monitoring , 2000, Digest of Papers. Fourth International Symposium on Wearable Computers.

[10]  Alex Pentland,et al.  Task-Specific Gesture Analysis in Real-Time Using Interpolated Views , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Stan Sclaroff,et al.  Simultaneous Localization and Recognition of Dynamic Hand Gestures , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[12]  Stan Sclaroff,et al.  Spatiotemporal gesture segmentation , 2006 .

[13]  Ryuichi Oka Spotting Method for Classification of Real World Data , 1998, Comput. J..

[14]  Wen Gao,et al.  An approach based on phonemes to large vocabulary Chinese sign language recognition , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[15]  Peter Morguet,et al.  Spotting dynamic hand gestures in video image sequences using hidden Markov models , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[16]  Chung-Lin Huang,et al.  Hand gesture recognition using a real-time tracking method and hidden Markov models , 2003, Image Vis. Comput..

[17]  Narendra Ahuja,et al.  Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Matthew Turk,et al.  View-based interpretation of real-time optical flow for gesture recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[19]  Roger J. Hubbold,et al.  Real-time Hand Tracking With Variable-Length Markov Models of Behaviour , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[20]  Seong-Whan Lee,et al.  Robust Spotting of Key Gestures from Whole Body Motion Sequence , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[21]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Michael I. Mandel,et al.  Visual Hand Tracking Using Nonparametric Belief Propagation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[23]  Claus Bahlmann,et al.  The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Alex Pentland,et al.  Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Sethuraman Panchanathan,et al.  Automated gesture segmentation from dance sequences , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[27]  Ho-Sub Yoon,et al.  Hand gesture recognition using combined features of location, angle and velocity , 2001, Pattern Recognit..

[28]  Stan Sclaroff,et al.  Automatic 2D Hand Tracking in Video Sequences , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[29]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[30]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[31]  Dimitris N. Metaxas,et al.  American sign language recognition: reducing the complexity of the task with phoneme-based modeling and parallel hidden markov models , 2003 .

[32]  Keechul Jung,et al.  Recognition-based gesture spotting in video games , 2004, Pattern Recognit. Lett..

[33]  Stan Sclaroff,et al.  Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning , 2005, ICCV-HCI.

[34]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[35]  Sudeep Sarkar,et al.  Unsupervised Modeling of Signs Embedded in Continuous Sentences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[36]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Yuntao Cui,et al.  Appearance-Based Hand Sign Recognition from Intensity Image Sequences , 2000, Comput. Vis. Image Underst..

[39]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  A. Corradini,et al.  Dynamic time warping for off-line recognition of a small gesture vocabulary , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[41]  Joseph B. Kruskal,et al.  Time Warps, String Edits, and Macromolecules , 1999 .

[42]  James L. Crowley,et al.  Active hand tracking , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[43]  H. Ney,et al.  Modeling Image Variability in Appearance-Based Gesture Recognition , 2006 .

[44]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[45]  Mathias Kölsch,et al.  Fast 2D Hand Tracking with Flocks of Features and Multi-Cue Integration , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[46]  Yoichi Sato,et al.  Real-Time Fingertip Tracking and Gesture Recognition , 2002, IEEE Computer Graphics and Applications.

[47]  R. Battison,et al.  Lexical Borrowing in American Sign Language , 1978 .

[48]  Tetsunori Kobayashi,et al.  Extension of hidden Markov models to deal with multiple candidates of observations and its application to mobile-robot-oriented gesture recognition , 2002, Object recognition supported by user interaction for service robots.

[49]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[50]  David Windridge,et al.  A Linguistic Feature Vector for the Visual Interpretation of Sign Language , 2004, ECCV.

[51]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[54]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[55]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.