Adaptive appearance learning for human pose estimation

We address the problem of pose estimation in videos. The part detectors play important roles, but traditional template-based detectors (e.g. Histogram of Gradient, HoG) fail at pose estimation due to the high variability in appearance. We present an adaptive representation of appearance and shape for articulated human body. The full representation of human body is based on the flexible mixture-of-parts model. We train a Naive Bayes classifier to obtain a confidence score of estimated pose by the basic mixture model, and based on the confidence we learn an instance-specific appearance model. For between-frame consistency, we design a time-efficient energy function for motion cues instead of complex motion models. We incorporate these models into a framework that allows for efficient inference. Quantitative evaluation of pose estimation conducted on two video datasets demonstrates the effectiveness of the proposed method.

[1]  Deva Ramanan,et al.  N-best maximal decoders for part models , 2011, 2011 International Conference on Computer Vision.

[2]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[3]  Bernt Schiele,et al.  Articulated people detection and pose estimation: Reshaping the future , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jianbo Shi,et al.  Bottom-up Recognition and Parsing of the Human Body , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[8]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[9]  Ben Taskar,et al.  Adaptive pose priors for pictorial structures , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Ben Taskar,et al.  Parsing human motion with stretchable models , 2011, CVPR 2011.

[12]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Andrew Zisserman,et al.  2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.