Easy Minimax Estimation with Random Forests for Human Pose Estimation

We describe a method for human parsing that is straightforward and competes with state-of-the-art performance on standard datasets. Unlike the state-of-the-art, our method does not search for individual body parts or poselets. Instead, a regression forest is used to predict a body configuration in body-space. The output of this regression forest is then combined in a novel way. Instead of averaging the output of each tree in the forest we use minimax to calculate optimal weights for the trees. This optimal weighting improves performance on rare poses and improves the generalization of our method to different datasets. Our paper demonstrates the unique advantage of random forest representations: minimax estimation is straightforward with no significant retraining burden.

[1]  Subhransu Maji,et al.  Object segmentation by alignment of poselet activations to image contours , 2011, CVPR 2011.

[2]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[3]  Ben Taskar,et al.  Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[4]  Andrew Zisserman,et al.  2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.

[5]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Stan Sclaroff,et al.  Fast globally optimal 2D human detection with loopy graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, CVPR 2004.

[8]  Yang Wang,et al.  Multiple Tree Models for Occlusion and Spatial Constraints in Human Pose Estimation , 2008, ECCV.

[9]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  David A. Forsyth,et al.  Human Tracking with Mixtures of Trees , 2001, ICCV.

[11]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Yang Wang,et al.  Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[13]  Andrew Zisserman,et al.  Pose search: Retrieving people using their pose , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Luc Van Gool,et al.  Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[16]  C. V. Jawahar,et al.  Video retrieval by mimicking poses , 2012, ICMR '12.

[17]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[18]  Ben Taskar,et al.  Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[20]  Hao Jiang,et al.  Global pose estimation using non-tree models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[23]  Jitendra Malik,et al.  Articulated Pose Estimation Using Discriminative Armlet Classifiers , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jitendra Malik,et al.  Recovering human body configurations using pairwise constraints between parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[26]  Ioannis A. Kakadiaris,et al.  Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  David A. Forsyth,et al.  Searching Video for Complex Activities with Finite State Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Subhransu Maji,et al.  Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[30]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[31]  Ben Taskar,et al.  MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  David A. Forsyth,et al.  Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[33]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.