论文信息 - Data-driven Recovery of Hand Depth using Conditional Regressive Random Forest on Stereo Images

Data-driven Recovery of Hand Depth using Conditional Regressive Random Forest on Stereo Images

Hand pose is emerging as an important interface for human-computer interaction. This paper presents a data-driven method to estimate a high-quality depth map of a hand from a stereoscopic camera input by introducing a novel superpixelbased regression framework that takes advantage of the smoothness of the depth surface of the hand. To this end, we introduce Conditional Regressive Random Forest (CRRF), a method that combines a Conditional Random Field (CRF) and a Regressive Random Forest (RRF) to model the mapping from a stereo RGB image pair to a depth image. The RRF provides a unary term that adaptively selects different stereo-matching measures as it implicitly determines matching pixels in a coarse-to-fine manner. While the RRF makes depth prediction for each super-pixel independently, the CRF unifies the prediction of depth by modeling pair-wise interactions between adjacent superpixels. Experimental results show that CRRF can generate a depth image more accurately than the leading contemporary techniques using an inexpensive stereo camera.

Eduardo Alonso | R. Basaru | Chris Child | G. Slabaugh

[1] Neil A. Thacker,et al. The Bhattacharyya metric as an absolute similarity measure for frequency coded data , 1998, Kybernetika.

[2] Zhengyou Zhang,et al. Flexible camera calibration by viewing a plane from unknown orientations , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3] Gary R. Bradski,et al. Stereo based gesture recognition invariant to 3D pose and lighting , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4] H. Hirschmuller. Accurate and efficient stereo processing by semi-global matching and mutual information , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5] Abdesselam Bouzerdoum,et al. Skin segmentation using color pixel classification: analysis and comparison , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Ashutosh Saxena,et al. Learning Depth from Single Monocular Images , 2005, NIPS.

[7] Heiko Hirschmüller,et al. Evaluation of Cost Functions for Stereo Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8] S. Todorovic,et al. Random Forest Random Field , 2010 .

[9] Stephen Gould,et al. Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10] Antonis A. Argyros,et al. Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[11] Antonis A. Argyros,et al. Tracking the articulated motion of two strongly interacting hands , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[13] P. K. Mishra. Superior Skin Color Model using Multiple of Gaussian Mixture Model , 2012 .

[14] Pascal Fua,et al. SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Min Sun,et al. Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Mokhtar M. Hasan,et al. Novel Algorithm for Skin Color Based Segmentation Using Mixture of Gmms , 2013 .

[17] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[18] Nasser Kehtarnavaz,et al. Real-time robust vision-based hand gesture recognition using stereo images , 2013, Journal of Real-Time Image Processing.

[19] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[20] David Sweeney,et al. Learning to be a depth camera for close-range human capture and interaction , 2014, ACM Trans. Graph..

[21] Eduardo Alonso,et al. Quantized Census for Stereoscopic Image Matching , 2014, 2014 2nd International Conference on 3D Vision.

[22] Guosheng Lin,et al. Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Mandy Eberhart,et al. Decision Forests For Computer Vision And Medical Image Analysis , 2016 .

[24] Eduardo Alonso,et al. HandyDepth: Example-based stereoscopic hand depth estimation using Eigen Leaf Node Features , 2016, 2016 International Conference on Systems, Signals and Image Processing (IWSSIP).

[25] Ruigang Yang,et al. Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Guoliang Fan,et al. Articulated and Generalized Gaussian Kernel Correlation for Human Pose Estimation , 2016, IEEE Transactions on Image Processing.

[27] Eduardo Alonso,et al. Conditional Regressive Random Forest Stereo-Based Hand Depth Recovery , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).