Evaluating shape and appearance descriptors for 3D human pose estimation

In this paper, we present a comparative evaluation of several appearance and shape descriptors in the context of 3D human pose estimation. Among the shape descriptors, we evaluate the Discrete Cosine Transform (DCT) and the Histogram of Shape Context (HoSC) descriptors. The five appearance descriptors that we evaluate are all variants of the Histogram of Oriented Gradients (HOG) descriptor. We evaluate these descriptors quantitatively using the HumanEva-I dataset. We report the performance of the descriptors using the Relevance Vector Machine (RVM) regression and K-nearest neighbor (KNN) regression methods. We found that the appearance descriptor computed at multiple spatial regions gave the best performance when RVM regression was used for pose estimation. The DCT descriptor performed the best when KNN regression was used for pose estimation.

[1]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[3]  Rómer Rosales,et al.  Learning Body Pose via Specialized Maps , 2001, NIPS.

[4]  Ankur Agarwal,et al.  A Local Basis Representation for Estimating Human Pose from Cluttered Images , 2006, ACCV.

[5]  Yihong Gong,et al.  Discriminative learning of visual words for 3D human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Adrian Hilton,et al.  Viewpoint invariant exemplar-based 3D human tracking , 2006, Comput. Vis. Image Underst..

[7]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  I. Haritaoglu,et al.  Background and foreground modeling using nonparametric kernel density estimation for visual surveillance , 2002 .

[9]  Mohammed Bennamoun,et al.  3D Human Pose Estimation from Static Images Using Local Features and Discriminative Learning , 2009, ICIAR.

[10]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[11]  Stefano Soatto,et al.  Fast Human Pose Estimation using Appearance and Motion via Multi-Dimensional Boosting Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Mohammed Bennamoun,et al.  Context-Based Appearance Descriptor for 3D Human Pose Estimation from Monocular Images , 2009, 2009 Digital Image Computing: Techniques and Applications.

[13]  Ronald Poppe,et al.  Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets , 2007 .

[14]  Ahmed M. Elgammal,et al.  Nonlinear manifold learning for dynamic shape and dynamic appearance , 2007, Comput. Vis. Image Underst..

[15]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[16]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Matthew Brand,et al.  Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  Cristian Sminchisescu,et al.  Fast algorithms for large scale conditional 3D prediction , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Cristian Sminchisescu,et al.  Semi-supervised Hierarchical Models for 3D Human Pose Reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Jing Li,et al.  A comprehensive review of current local features for computer vision , 2008, Neurocomputing.

[22]  Gary R. Bradski,et al.  Motion segmentation and pose recognition with motion history gradients , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[23]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .