Online generative model personalization for hand tracking

We present a new algorithm for real-time hand tracking on commodity depth-sensing devices. Our method does not require a user-specific calibration session, but rather learns the geometry as the user performs live in front of the camera, thus enabling seamless virtual interaction at the consumer level. The key novelty in our approach is an online optimization algorithm that jointly estimates pose and shape in each frame, and determines the uncertainty in such estimates. This knowledge allows the algorithm to integrate per-frame estimates over time, and build a personalized geometric model of the captured user. Our approach can easily be integrated in state-of-the-art continuous generative motion tracking software. We provide a detailed evaluation that shows how our approach achieves accurate motion tracking for real-time applications, while significantly simplifying the workflow of accurate hand performance capture. We also provide quantitative evaluation datasets at http://gfx.uvic.ca/datasets/handy

[1]  Katherine B. Martin,et al.  Facial Action Coding System , 2015 .

[2]  Yi Yang,et al.  Depth-Based Hand Pose Estimation: Data, Methods, and Challenges , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Tae-Kyun Kim,et al.  Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[5]  J. L. Roux An Introduction to the Kalman Filter , 2003 .

[6]  F. W. Cathey,et al.  The iterated Kalman filter update as a Gauss-Newton method , 1993, IEEE Trans. Autom. Control..

[7]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[8]  Antti Oulasvirta,et al.  Fast and robust hand tracking using detection-guided optimization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrea Tagliasacchi,et al.  Robust Articulated-ICP for Real-Time Hand Tracking , 2015 .

[10]  Thabo Beeler,et al.  Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..

[11]  Andrew W. Fitzgibbon,et al.  Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences , 2016, ACM Trans. Graph..

[12]  Qi Ye,et al.  BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yangang Wang,et al.  Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[14]  Antonis A. Argyros,et al.  Model-based 3D Hand Tracking with on-line Shape Adaptation , 2015, BMVC.

[15]  Andrea Tagliasacchi,et al.  Sphere-meshes for real-time hand modeling and tracking , 2016, ACM Trans. Graph..

[16]  Andrew W. Fitzgibbon,et al.  Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[17]  Michael J. Black,et al.  Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Hao Li,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[19]  Andrew W. Fitzgibbon,et al.  Fits Like a Glove: Rapid and Reliable Hand Shape Personalization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Andrea Tagliasacchi,et al.  Low-Dimensionality Calibration through Local Anisotropic Scaling for Robust Hand Model Personalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Matthias Nießner,et al.  Learning to Navigate the Energy Landscape , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[22]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[23]  John P. Lewis,et al.  Human hand modeling from surface anatomy , 2006, I3D '06.

[24]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Hauke Strasdat,et al.  Visual SLAM: Why filter? , 2012, Image Vis. Comput..

[26]  Daniel Axehill,et al.  Extended Kalman filter modifications based on an optimization view point , 2015, 2015 18th International Conference on Information Fusion (Fusion).

[27]  Andrew W. Fitzgibbon,et al.  Learning an efficient model of hand shape variation from depth images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jan Kautz,et al.  Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Hauke Strasdat,et al.  WITHDRAWN: Visual SLAM: Why filter? , 2012 .

[30]  Antonis A. Argyros,et al.  Model-based 3 D Hand Tracking with on-line Hand Shape Adaptation , 2015 .

[31]  Kun Zhou,et al.  Real-time facial animation with image-based dynamic avatars , 2016, ACM Trans. Graph..

[32]  Justus Thies,et al.  Real-time expression transfer for facial reenactment , 2015, ACM Trans. Graph..

[33]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[34]  Varun Ramakrishna,et al.  User-Specific Hand Modeling from Monocular Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Hans-Peter Seidel,et al.  Construction and animation of anatomically based human hand models , 2003, SCA '03.

[36]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[37]  Vincent Lepetit,et al.  Hands Deep in Deep Learning for Hand Pose Estimation , 2015, ArXiv.

[38]  Bodo Rosenhahn,et al.  Model-Based Pose Estimation , 2011, Visual Analysis of Humans.

[39]  O. Straka,et al.  Performance evaluation of iterated extended Kalman filter with variable step-length , 2015 .

[40]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[41]  Edward W. Kamen,et al.  New nonlinear iterated filter with applications to target tracking , 1995, Optics & Photonics.

[42]  Danping Zou,et al.  CoSLAM: Collaborative Visual SLAM in Dynamic Environments , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.