Egocentric hand pose estimation and distance recovery in a single RGB image

Articulated hand pose recovery in egocentric vision is useful for in-air interaction with the wearable devices, such as the Google glasses. Despite the progress obtained with the depth camera, this task is still challenging with ordinary RGB cameras. In this paper we demonstrate the possibility to recover both the articulated hand pose and its distance from the camera with a single RGB camera in egocentric view. We address this problem by modeling the distance as a hidden variable and use the Conditional Regression Forest to infer the pose and distance jointly. Especially, we find that the pose estimation accuracy can be further enhanced by incorporating the hand part semantics. The experimental results show that the proposed method achieves good performance on both a synthesized dataset and several real-world color image sequences that are captured in different environments. In addition, our system runs in real-time at more than 10fps.

[1]  Luc Van Gool,et al.  Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  赵颜果,et al.  Hand Detection Using Multi-Resolution HOG Features , 2012 .

[3]  Aggelos K. Katsaggelos,et al.  Part-based initialization for hand tracking , 2010, 2010 IEEE International Conference on Image Processing.

[4]  Daniel Thalmann,et al.  Resolving Ambiguous Hand Pose Predictions by Exploiting Part Correlations , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Matilde Gonzalez,et al.  Head Tracking and Hand Segmentation during Hand over Face Occlusion in Sign Language , 2010, ECCV Workshops.

[8]  Gang Hua,et al.  Tracking articulated body by dynamic Markov network , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Li Cheng,et al.  Efficient Hand Pose Estimation from a Single Depth Image , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Roger J. Hubbold,et al.  A real-time hand tracker using variable-length Markov models of behaviour , 2007, Comput. Vis. Image Underst..

[11]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Keiichi Abe,et al.  Topological structural analysis of digitized binary images by border following , 1985, Comput. Vis. Graph. Image Process..

[13]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  David J. Fleet,et al.  Model-based hand tracking with texture, shading and self-occlusions , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Mathias Kölsch,et al.  Robust hand detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[17]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Andrew Zisserman,et al.  Hand detection using multiple proposals , 2011, BMVC.