Human Pose Co-Estimation and Applications

Most existing techniques for articulated Human Pose Estimation (HPE) consider each person independently. Here we tackle the problem in a new setting, coined Human Pose Coestimation (PCE), where multiple people are in a common, but unknown pose. The task of PCE is to estimate their poses jointly and to produce prototypes characterizing the shared pose. Since the poses of the individual people should be similar to the prototype, PCE has less freedom compared to estimating each pose independently, which simplifies the problem. We demonstrate our PCE technique on two applications. The first is estimating the pose of people performing the same activity synchronously, such as during aerobics, cheerleading, and dancing in a group. We show that PCE improves pose estimation accuracy over estimating each person independently. The second application is learning prototype poses characterizing a pose class directly from an image search engine queried by the class name (e.g., “lotus pose”). We show that PCE leads to better pose estimation in such images, and it learns meaningful prototypes which can be used as priors for pose estimation in novel images.

[1]  Andrew Zisserman,et al.  Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts , 2008, BMVC.

[2]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[3]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[4]  Ben Taskar,et al.  Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Simon Lucey,et al.  Deterministic 3D Human Pose Estimation Using Rigid Structure , 2010, ECCV.

[6]  Vittorio Ferrari,et al.  We Are Family: Joint Pose Estimation of Multiple Persons , 2010, ECCV.

[7]  David A. Forsyth,et al.  Human Tracking with Mixtures of Trees , 2001, ICCV.

[8]  Ben Taskar,et al.  Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[9]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Andrew Zisserman,et al.  Pose search: Retrieving people using their pose , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Cristian Sminchisescu,et al.  Training Deformable Models for Localization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Michael J. Black,et al.  Predicting 3D People from 2D Pictures , 2006, AMDO.

[16]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[18]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[19]  Mun Wai Lee,et al.  Human Upper Body Pose Estimation in Static Images , 2004, ECCV.

[20]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[21]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[22]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  David A. Forsyth,et al.  Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[24]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[25]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[26]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[27]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[28]  H. Charles Romesburg,et al.  Cluster analysis for researchers , 1984 .

[29]  Cristian Sminchisescu,et al.  Semi-supervised Hierarchical Models for 3D Human Pose Reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  F. Quimby What's in a picture? , 1993, Laboratory animal science.

[31]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.