Robust Hand Pose Regression Using Convolutional Neural Networks

Hand pose estimation is useful for several human-computer interaction applications, like sign language recognition, the identification of more complex behaviors such as hand gestures and interaction in virtual reality applications. In this work, we propose a system which is able to predict the 2D hand joints using a monocular color camera. To do that, we propose to use a 3D hand tracking sensor for collecting ground truth information that is projected to the camera image plane. We present a novel pipeline that leverages deep learning techniques for hand pose estimation. The proposed Convolutional Neural Networks (CNN) is able to infer the joints of the hand from an image without the need of any additional sensor.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Haiying Guan,et al.  Model-based 3D hand posture estimation from a single 2D image , 2002, Image Vis. Comput..

[3]  Bodo Rosenhahn,et al.  Real-Time Sign Language Recognition Using a Consumer Depth Camera , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[4]  Michal Kawulok,et al.  Hand landmarks detection and localization in color images , 2016, Multimedia Tools and Applications.

[5]  Rogério Schmidt Feris,et al.  Multi-view Appearance-based 3D Hand Pose Estimation , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[6]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[8]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[9]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jürgen Beyerer,et al.  Fast Invariant Contour-Based Classification of Hand Symbols for HCI , 2009, CAIP.

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15]  Andrew Zisserman,et al.  Hand detection using multiple proposals , 2011, BMVC.

[16]  Hyotaek Lim,et al.  Hand tracking and gesture recognition system for human-computer interaction using low-cost hardware , 2015, Multimedia Tools and Applications.

[17]  Sterling Orsten,et al.  Dynamics based 3D skeletal hand tracking , 2013, I3D '13.

[18]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..