Fast Human Pose Detection Using Randomized Hierarchical Cascades of Rejectors

This paper addresses human detection and pose estimation from monocular images by formulating it as a classification problem. Our main contribution is a multi-class pose detector that uses the best components of state-of-the-art classifiers including hierarchical trees, cascades of rejectors as well as randomized forests. Given a database of images with corresponding human poses, we define a set of classes by discretizing camera viewpoint and pose space. A bottom-up approach is first followed to build a hierarchical tree by recursively clustering and merging the classes at each level. For each branch of this decision tree, we take advantage of the alignment of training images to build a list of potentially discriminative HOG (Histograms of Orientated Gradients) features. We then select the HOG blocks that show the best rejection performances. We finally grow an ensemble of cascades by randomly sampling one of these HOG-based rejectors at each branch of the tree. The resulting multi-class classifier is then used to scan images in a sliding window scheme. One of the properties of our algorithm is that the randomization can be applied on-line at no extra-cost, therefore classifying each window with a different ensemble of randomized cascades. Our approach, when compared to other pose classifiers, gives fast and efficient detection performances with both fixed and moving cameras. We present results using different publicly available training and testing data sets.

[1]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[2]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[3]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[6]  Ryusuke Miyamoto,et al.  A Real-Time Object Recognition System on Cell Broadband Engine , 2007, PSIVT.

[7]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Björn Stenger,et al.  Multivariate Relevance Vector Machines for Tracking , 2006, ECCV.

[9]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Alberto Sanfeliu,et al.  Local Boosted Features for Pedestrian Detection , 2009, IbPRIA.

[11]  Vincent Lepetit,et al.  Bridging the Gap between Detection and Tracking for 3D Monocular Video-Based Motion Capture , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jesús Martínez del Rincón,et al.  A spatio-temporal 2D-models framework for human pose recovery in monocular sequences , 2008, Pattern Recognit..

[13]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[14]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  R. Collins,et al.  On-line selection of discriminative tracking features , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Christoph Schnörr,et al.  A Study of Parts-Based Object Class Detection Using Complete Graphs , 2010, International Journal of Computer Vision.

[18]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[19]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[20]  Michael J. Black,et al.  Guest Editorial: State of the Art in Image- and Video-Based Human Pose and Motion Estimation , 2010, International Journal of Computer Vision.

[21]  BlakeAndrew,et al.  Real-time human pose recognition in parts from single depth images , 2013 .

[22]  Ahmed M. Elgammal,et al.  Coupled Visual and Kinematic Manifold Models for Tracking , 2010, International Journal of Computer Vision.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Ralph Gross,et al.  The CMU Motion of Body (MoBo) Database , 2001 .

[27]  Andrew Blake,et al.  Probabilistic Tracking with Exemplars in a Metric Space , 2002, International Journal of Computer Vision.

[28]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Vincent Lepetit,et al.  Human body pose detection using Bayesian spatio-temporal templates , 2006, Comput. Vis. Image Underst..

[30]  Dorin Comaniciu,et al.  Joint Real-time Object Detection and Pose Estimation Using Probabilistic Boosting Network , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Stefano Soatto,et al.  Fast Human Pose Estimation using Appearance and Motion via Multi-Dimensional Boosting Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[33]  Carlos Orrite-Uruñuela,et al.  HOG-Based Decision Tree for Facial Expression Classification , 2009, IbPRIA.

[34]  Luc Van Gool,et al.  Learning Generative Models for Multi-Activity Body Pose Estimation , 2008, International Journal of Computer Vision.

[35]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[36]  Ivan Laptev,et al.  Improving object detection with boosted histograms , 2009, Image Vis. Comput..

[37]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  CipollaRoberto,et al.  Model-Based Hand Tracking Using a Hierarchical Bayesian Filter , 2006 .

[40]  E. Rückert Detecting Pedestrians by Learning Shapelet Features , 2007 .

[41]  Roberto Cipolla,et al.  Hierarchical Part-Based Human Body Pose Estimation , 2005, BMVC.

[42]  Luc Van Gool,et al.  A Hierarchical System for Recognition, Tracking and Pose Estimation , 2004, MLMI.

[43]  Ahmed M. Elgammal,et al.  Tracking People on a Torus , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ramakant Nevatia,et al.  Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[45]  Ankur Agarwal,et al.  Incorporating On-demand Stereo for Real Time Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Dariu Gavrila,et al.  A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  D. Kendall MORPHOMETRIC TOOLS FOR LANDMARK DATA: GEOMETRY AND BIOLOGY , 1994 .

[48]  Stefano Soatto,et al.  Detecting Humans via Their Pose , 2006, NIPS.

[49]  Hans-Peter Seidel,et al.  Optimization and Filtering for Human Motion Capture , 2010, International Journal of Computer Vision.

[50]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[51]  Stefano Soatto,et al.  Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images , 2008, ECCV.

[52]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[53]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[54]  Stan Z. Li,et al.  Real-time multi-view face detection , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[55]  Xiaoqing Ding,et al.  Real-time multi-view face detection and pose estimation based on cost-sensitive AdaBoost , 2005 .

[56]  Cristian Sminchisescu,et al.  Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[57]  Philip H. S. Torr,et al.  Randomized trees for human pose detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Björn Stenger,et al.  A Single Camera Motion Capture System for Human-Computer Interaction , 2008, IEICE Trans. Inf. Syst..

[59]  Jitendra Malik,et al.  Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Stephen J. McKenna,et al.  Human Pose Estimation Using Learnt Probabilistic Region Similarities and Partial Configurations , 2004, ECCV.

[61]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[62]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63]  Larry S. Davis,et al.  Shape-Based Human Detection and Segmentation via Hierarchical Part-Template Matching , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.