Real-time viewpoint-invariant hand localization with cluttered backgrounds

Over the past few years there has been a growing interest in visual interfaces based on gestures. Using gestures as a mean to communicate with a computer can be helpful in applications such as gaming platforms, domotic environments, augmented reality or sign language interpretation to name a few. However, a serious bottleneck for such interfaces is the current lack of accurate hand localization systems, which are necessary for tracking (re-)initialization and hand pose understanding. In fact, human hand is an articulated object with a very large degree of appearance variability which is difficult to deal with. For instance, recent attempts to solve this problem using machine learning approaches have shown poor generalization capabilities over different viewpoints and finger spatial configurations. In this article we present a model based approach to articulated hand detection which splits this variability problem by separately searching for simple finger models in the input image. A generic finger silhouette is localized in the edge map of the input image by combining curve and graph matching techniques. Cluttered backgrounds and thick textured images, which usually make it hard to compare edge information with silhouette models (e.g., using chamfer distance or voting based methods) are dealt with in our approach by simultaneously using connected curves and topological information. Finally, detected fingers are clustered using geometric constraints. Our system is able to localize in real time a hand with variable finger configurations in images with complex backgrounds, different lighting conditions and different positions of the hand with respect to the camera. Experiments with real images and videos and a simple visual interface are presented to validate the proposed method.

[1]  Mathias Kölsch,et al.  Robust hand detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[2]  David A. Forsyth,et al.  Body plans , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Paul A. Beardsley,et al.  Computer Vision for Interactive Computer Graphics , 1998, IEEE Computer Graphics and Applications.

[4]  Björn Stenger,et al.  Learning a Kinematic Prior for Tree-Based Filtering , 2003, BMVC.

[5]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Helge J. Ritter,et al.  GREFIT: Visual Recognition of Hand Postures , 1999, Gesture Workshop.

[7]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[9]  Surendra Ranganath,et al.  Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Eun-Jung Holden,et al.  Visual recognition of hand motion , 1997 .

[11]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Chandrika Kamath,et al.  Robust techniques for background subtraction in urban traffic video , 2004, IS&T/SPIE Electronic Imaging.

[13]  Javier Ruiz-del-Solar,et al.  Real-time tracking of multiple persons , 2003, 12th International Conference on Image Analysis and Processing, 2003.Proceedings..

[14]  Thomas B. Moeslund,et al.  3D human pose estimation using 2D-Data and an alternative phase space representation , 2000 .

[15]  Yoshiaki Shirai,et al.  Real-time 3D hand posture estimation based on 2D appearance retrieval using monocular camera , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[16]  Serge J. Belongie,et al.  Matching with shape contexts , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[17]  Luigi Cinque,et al.  Identifying elephant photos by multi-curve matching , 2008, Pattern Recognit..

[18]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[19]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[20]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[21]  Alberto Del Bimbo,et al.  Visual capture and understanding of hand pointing actions in a 3-D environment , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[22]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[23]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Sidharth Bhatia,et al.  Tracking loose-limbed people , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[25]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Juan Pablo Wachs,et al.  Human posture recognition for intelligent vehicles , 2010, Journal of Real-Time Image Processing.

[27]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Richard Bowden,et al.  A boosted classifier tree for hand shape detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[30]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[31]  Tobias Höllerer,et al.  Vision-based interfaces for mobility , 2004, The First Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services, 2004. MOBIQUITOUS 2004..

[32]  Marinette Revenu,et al.  Hand detection for contactless biometrics identification , 2006 .

[33]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[34]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Nianjun Liu,et al.  Hand Gesture Extraction by Active Shape Models , 2005, Digital Image Computing: Techniques and Applications (DICTA'05).

[36]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[37]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Ying Wu,et al.  Hand modeling, analysis and recognition , 2001, IEEE Signal Process. Mag..

[39]  Thomas B. Moeslund,et al.  Modelling and estimating the pose of a human arm , 2003, Machine Vision and Applications.

[40]  Thomas S. Huang,et al.  Tracking articulated hand motion with eigen dynamics analysis , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.