Real-time human pose recognition in parts from single depth images

We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc. Finally we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes. The system runs at 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.

[1]  B. A. Shepherd,et al.  An Appraisal of a Decision Tree Approach to Image Classification , 1983, IJCAI.

[2]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[3]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[4]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[5]  Dariu Gavrila,et al.  Pedestrian Detection from a Moving Vehicle , 2000, ECCV.

[6]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[8]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[9]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  David A. Forsyth,et al.  Probabilistic Methods for Finding People , 2001, International Journal of Computer Vision.

[14]  Ankur Agarwal,et al.  3D human pose from silhouettes by relevance vector regression , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  Michael Isard,et al.  Tracking loose-limbed people , 2004, CVPR 2004.

[16]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[17]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[18]  Ben Taskar,et al.  Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Reinhard Koch,et al.  Nonlinear Body Pose Estimation from Depth Images , 2005, DAGM-Symposium.

[21]  Rüdiger Dillmann,et al.  Sensor fusion for model based 3D tracking , 2006, 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[22]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[23]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[24]  Rüdiger Dillmann,et al.  Sensor fusion for 3D human body tracking with an articulated 3D body model , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[25]  Jamie Shotton,et al.  The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[27]  Kikuo Fujimura,et al.  Constrained Optimization for Human Pose Estimation from Depth Sequences , 2007, ACCV.

[28]  Andrew W. Fitzgibbon,et al.  The Joint Manifold Model for Semi-supervised Multi-valued Regression , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[29]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[30]  Philip H. S. Torr,et al.  Randomized trees for human pose detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Toby Sharp,et al.  Implementing Decision Trees and Forests on a GPU , 2008, ECCV.

[32]  Zhuowen Tu,et al.  Auto-context and its application to high-level vision tasks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Stefano Soatto,et al.  Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images , 2008, ECCV.

[34]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Yihong Gong,et al.  Discriminative learning of visual words for 3D human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[38]  Robert Y. Wang,et al.  Real-time hand-tracking with a color glove , 2009, ACM Trans. Graph..

[39]  Aaron Hertzmann,et al.  Learning 3D mesh segmentation and labeling , 2010, ACM Trans. Graph..

[40]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[41]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Gérard G. Medioni,et al.  Human pose estimation from a single view point, real-time range sensor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[43]  Kalpana C. Jondhale,et al.  Shape matching and object recognition using shape contexts , 2010 .

[44]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.