Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation

We investigate the task of 2D articulated human pose estimation in unconstrained still images. This is extremely challenging because of variation in pose, anatomy, clothing, and imaging conditions. Current methods use simple models of body part appearance and plausible configurations due to limitations of available training data and constraints on computational expense. We show that such models severely limit accuracy. Building on the successful pictorial structure model (PSM) we propose richer models of both appearance and pose, using state-of-the-art discriminative classifiers without introducing unacceptable computational expense. We introduce a new annotated database of challenging consumer images, an order of magnitude larger than currently available datasets, and demonstrate over 50% relative improvement in pose estimation accuracy over a stateof-the-art method.

[1]  David C. Hogg Model-based vision: a program to see a walking person , 1983, Image Vis. Comput..

[2]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[3]  David A. Forsyth,et al.  Mixtures of trees for object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[5]  Michael J. Black,et al.  Learning the statistics of peopl learning the statistics of people in images and video , 2003 .

[6]  Mun Wai Lee,et al.  Human Upper Body Pose Estimation in Static Images , 2004, ECCV.

[7]  B. Schölkopf,et al.  Efficient face detection by a cascaded support–vector machine expansion , 2004, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[8]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[9]  Michael J. Black,et al.  Learning the Statistics of People in Images and Video , 2003, International Journal of Computer Vision.

[10]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[11]  Sidharth Bhatia,et al.  Tracking loose-limbed people , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[12]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Cristian Sminchisescu,et al.  Training Deformable Models for Localization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[18]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[19]  Yang Wang,et al.  Multiple Tree Models for Occlusion and Spatial Constraints in Human Pose Estimation , 2008, ECCV.

[20]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Andrew Zisserman,et al.  Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts , 2008, BMVC.

[23]  Mark Everingham,et al.  Combining discriminative appearance and segmentation cues for articulated human pose estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[24]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[25]  Andrew Zisserman,et al.  Efficient discriminative learning of parts-based models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  B. Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.