Accurate Human Pose Estimation by Aggregating Multiple Pose Hypotheses Using Modified Kernel Density Approximation

This letter proposes an accurate human pose estimation method that uses a modified kernel density approximation (m-KDA) to multiple pose hypotheses. Existing methods show poor human pose estimation because of cluttered background or self-occlusion by the human. To improve the pose estimation accuracy, we propose to use m-KDA to aggregate multiple pose estimation results. First, we use the flexible mixture-of-parts model (FMM) to estimate the human poses then use the top-M scores to choose the good pose hypotheses. Second, we aggregate the top-M pose hypotheses with the m-KDA, in which each kernel density function is modified by each pose's score value and each pose's compatibility function that represents how far each pose hypothesis is departed from the nominal value of top-M pose hypotheses. Third, we determine the optimal pose configuration by repeating the above m-KDA computation, starting from the root part (head) to the leaf parts (hands and feet), sequentially. In pose estimation experiments on two benchmark datasets (PARSE and LSP), the proposed method achieved 1.5-4.0% improvement in the percentage of correct localized parts (PCP) over the state-of-the-art methods.

[1]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Nassir Navab,et al.  Holistic Human Pose Estimation with Regression Forests , 2014, AMDO.

[3]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  David A. Forsyth,et al.  Discriminative hierarchical part-based models for human parsing and action recognition , 2012, J. Mach. Learn. Res..

[5]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Dariu Gavrila,et al.  A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yuandong Tian,et al.  Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation , 2012, ECCV.

[8]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[9]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[10]  Jitendra Malik,et al.  Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[12]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Yi Li,et al.  Beyond Physical Connections: Tree Models in Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[15]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[16]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).