Guided optimisation through classification and regression for hand pose estimation

Proposes hand pose estimation using a combination of model optimisation and discriminative methods which allows tracking to be performed at over 40 frames per second using a single CPU thread.Introduces a residual error regression for hand pose estimation, learning from mistakes in model optimisation.A method of training, which captures system response and user variance, allowing supervised feedback for joint refinement.Extensive quantitative and qualitative evaluation including additional datasets and comparison against multiple state of the art approaches. This paper presents an approach to hand pose estimation that combines discriminative and model-based methods to leverage the advantages of both. Randomised Decision Forests are trained using real data to provide fast coarse segmentation of the hand. The segmentation then forms the basis of constraints applied in model fitting, using an efficient projected GaussSeidel solver, which enforces temporal continuity and kinematic limitations. However, when fitting a generic model to multiple users with varying hand shape, there is likely to be residual errors between the model and their hand. Also, local minima can lead to failures in tracking that are difficult to recover from. Therefore, we introduce an error regression stage that learns to correct these instances of optimisation failure. The approach provides improved accuracy over the current state of the art methods, through the inclusion of temporal cohesion and by learning to correct from failure cases. Using discriminative learning, our approach performs guided optimisation, greatly reducing model fitting complexity and radically improves efficiency. This allows tracking to be performed at over 40 frames per second using a single CPU thread.

[1]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Christian Wolf,et al.  Hand Segmentation with Structured Convolutional Learning , 2014, ACCV.

[3]  Luc Van Gool,et al.  Tracking a hand manipulating an object , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Richard Bowden,et al.  Multi-touchless: Real-time fingertip detection and tracking using geodesic maxima , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[5]  Daniel Thalmann,et al.  Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization , 2013, The Visual Computer.

[6]  Jian Sun,et al.  Cascaded hand pose regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Lale Akarun,et al.  Real time hand pose estimation using depth sensors , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[8]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[9]  Andreas Aristidou,et al.  Motion capture with constrained inverse kinematics for real-time hand tracking , 2010, 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[10]  Luca Viganò,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2015, IWSEC 2015.

[11]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Andrew Gilbert,et al.  A Multitouchless Interface: Expanding User Interaction , 2014, IEEE Computer Graphics and Applications.

[13]  Jovan Popović,et al.  Real-time hand-tracking with a color glove , 2009, SIGGRAPH 2009.

[14]  A. Buryanov,et al.  Proportions of Hand Segments , 2010 .

[15]  Dieter Fox,et al.  DART: dense articulated real-time tracking with consumer depth cameras , 2015, Auton. Robots.

[16]  Andrew W. Fitzgibbon,et al.  Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[17]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[18]  Sterling Orsten,et al.  Dynamics based 3D skeletal hand tracking , 2013, I3D '13.

[19]  Daniel Thalmann,et al.  3D fingertip and palm tracking in depth image sequences , 2012, ACM Multimedia.

[20]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Paulo R. S. Mendonça,et al.  Model-based 3D tracking of an articulated hand , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[22]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Stavros J. Perantonis,et al.  Hand Shape and 3D Pose Estimation Using Depth Data from a Single Cluttered Frame , 2012, ISVC.

[25]  S. Sathiya Keerthi,et al.  A fast procedure for computing the distance between complex objects in three-dimensional space , 1988, IEEE J. Robotics Autom..

[26]  Ying Wu,et al.  Modeling the constraints of human hand motion , 2000, Proceedings Workshop on Human Motion.

[27]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[28]  Paolo Dario,et al.  A Survey of Glove-Based Systems and Their Applications , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[30]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[31]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[32]  Varun Ramakrishna,et al.  User-Specific Hand Modeling from Monocular Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Andrew Gilbert,et al.  Combining discriminative and model based approaches for hand pose estimation , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[34]  Andrea Tagliasacchi,et al.  Robust Articulated-ICP for Real-Time Hand Tracking , 2015 .

[35]  Manolis I. A. Lourakis,et al.  Evolutionary Quasi-Random Search for Hand Articulations Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Li Cheng,et al.  Efficient Hand Pose Estimation from a Single Depth Image , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[38]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Hans-Peter Seidel,et al.  Construction and animation of anatomically based human hand models , 2003, SCA '03.