3D Hand Pose Estimation Using Randomized Decision Forest with Segmentation Index Points

In this paper, we propose a real-time 3D hand pose estimation algorithm using the randomized decision forest framework. Our algorithm takes a depth image as input and generates a set of skeletal joints as output. Previous decision forest-based methods often give labels to all points in a point cloud at a very early stage and vote for the joint locations. By contrast, our algorithm only tracks a set of more flexible virtual landmark points, named segmentation index points (SIPs), before reaching the final decision at a leaf node. Roughly speaking, a SIP represents the centroid of a subset of skeletal joints, which are to be located at the leaves of the branch expanded from the SIP. Inspired by recent latent regression forest-based hand pose estimation framework (Tang et al. 2014), we integrate SIP into the framework with several important improvements: First, we devise a new forest growing strategy, whose decision is made using a randomized feature guided by SIPs. Second, we speed-up the training procedure since only SIPs, not the skeletal joints, are estimated at non-leaf nodes. Third, the experimental results on public benchmark datasets show clearly the advantage of the proposed algorithm over previous state-of-the-art methods, and our algorithm runs at 55.5 fps on a normal CPU without parallelism.

[1]  Javier Ruiz-del-Solar,et al.  Real-Time Hand Gesture Detection and Recognition Using Boosted Classifiers and Active Learning , 2007, PSIVT.

[2]  Ioannis A. Kakadiaris,et al.  Estimating anthropometry and pose from a single image , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[3]  Rogério Schmidt Feris,et al.  The isometric self-organizing map for 3D hand pose estimation , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[4]  LingHaibin,et al.  Shape Classification Using the Inner-Distance , 2007 .

[5]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[6]  Gérard G. Medioni,et al.  Human pose estimation from a single view point, real-time range sensor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[7]  Li Cheng,et al.  Efficient Hand Pose Estimation from a Single Depth Image , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[9]  Antonis A. Argyros,et al.  Scalable 3D Tracking of Multiple Interacting Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Andrew W. Fitzgibbon,et al.  Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[11]  Ioannis A. Kakadiaris,et al.  Active part-decomposition, shape and motion estimation of articulated objects: a physics-based approach , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Sharath Pankanti,et al.  Hand tracking by binary quadratic programming and its application to retail activity recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[14]  Antonis A. Argyros,et al.  Tracking the articulated motion of two strongly interacting hands , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[17]  Haibin Ling,et al.  Shape Classification Using the Inner-Distance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Adolfo López,et al.  Real-time upper body tracking with online initialization using a range sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[19]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[20]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[23]  Manolis I. A. Lourakis,et al.  Evolutionary Quasi-Random Search for Hand Articulations Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Rüdiger Dillmann,et al.  Sensor fusion for 3D human body tracking with an articulated 3D body model , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[25]  Ioannis A. Kakadiaris,et al.  Estimating Anthropometry and Pose from a Single Uncalibrated Image , 2001, Comput. Vis. Image Underst..

[26]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[28]  Sterling Orsten,et al.  Dynamics based 3D skeletal hand tracking , 2013, I3D '13.

[29]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Luc Van Gool,et al.  Tracking a hand manipulating an object , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.