Fast Random‐Forest‐Based Human Pose Estimation Using a Multi‐scale and Cascade Approach

Since the recent launch of Microsoft Xbox Kinect, research on 3D human pose estimation has attracted a lot of attention in the computer vision community. Kinect shows impressive estimation accuracy and real-time performance on massive graphics processing unit hardware. In this paper, we focus on further reducing the computation complexity of the existing state-of-the-art method to make the real-time 3D human pose estimation functionality applicable to devices with lower computing power. As a result, we propose two simple approaches to speed up the random-forest-based human pose estimation method. In the original algorithm, the random forest classifier is applied to all pixels of the segmented human depth image. We first use a multi-scale approach to reduce the number of such calculations. Second, the complexity of the random forest classification itself is decreased by the proposed cascade approach. Experiment results for real data show that our method is effective and works in real time (30 fps) without any parallelization efforts.

[1]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[2]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[4]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[8]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[9]  Julius Ziegler,et al.  Tracking of the Articulated Upper Body on Multi-View Stereo Image Sequences , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[12]  Gérard G. Medioni,et al.  Human pose estimation from a single view point, real-time range sensor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[13]  Reinhard Koch,et al.  Nonlinear Body Pose Estimation from Depth Images , 2005, DAGM-Symposium.

[14]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[15]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[16]  Rüdiger Dillmann,et al.  Sensor fusion for 3D human body tracking with an articulated 3D body model , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[17]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[20]  Behzad Dariush,et al.  Controlled human pose estimation from depth image streams , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[21]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[23]  Kikuo Fujimura,et al.  Constrained Optimization for Human Pose Estimation from Depth Sequences , 2007, ACCV.

[24]  Sergio Escalera Human Behavior Analysis from Depth Maps , 2012, AMDO.

[25]  Andrew Blake,et al.  Efficient Human Pose Estimation from Single Depth Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Gil-Haeng Lee,et al.  Tracking and Interaction Based on Hybrid Sensing for Virtual Environments , 2013 .

[27]  Sergio Escalera,et al.  Graph cuts optimization for multi-limb human segmentation in depth maps , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.