A hand pose tracking benchmark from stereo matching

In this paper we establish a long-term 3D hand pose tracking benchmark1. It contains 18,000 stereo image pairs as well as the ground-truth 3D positions of palm and finger joints from different scenarios. Meanwhile, to accurately segment hand from stereo images, we propose a novel stereo-based hand segmentation and depth estimation algorithm specially tailored for hand tracking here. The experiments indicate the effectiveness of the proposed algorithm by demonstrating that its tracking performance is comparable to the use of an active depth sensor under various of challenging scenarios.

[1]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[2]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 2004, International Journal of Computer Vision.

[3]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Carsten Rother,et al.  PatchMatch Stereo - Stereo Matching with Slanted Support Windows , 2011, BMVC.

[5]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[6]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Hongyang Chao,et al.  MeshStereo: A Global Stereo Model with Mesh Alignment Regularization for View Interpolation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  D. Nistér,et al.  Stereo Matching with Color-Weighted Correlation, Hierarchical Belief Propagation, and Occlusion Handling , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[11]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Geoffrey Egnal,et al.  A stereo confidence metric using single view imagery with comparison to five alternative approaches , 2004, Image Vis. Comput..

[13]  Jean-Yves Bouguet,et al.  Camera calibration toolbox for matlab , 2001 .

[14]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[15]  Jian Sun,et al.  Cascaded hand pose regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Emad Barsoum,et al.  Articulated Hand Pose Estimation Review , 2016, ArXiv.

[17]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[18]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[19]  Z. Zivkovic Improved adaptive Gaussian mixture model for background subtraction , 2004, ICPR 2004.

[20]  Yi Yang,et al.  Depth-Based Hand Pose Estimation: Methods, Data, and Challenges , 2015, International Journal of Computer Vision.

[21]  Nikolaos G. Bourbakis,et al.  A survey of skin-color modeling and detection methods , 2007, Pattern Recognit..