Discriminative fusion of shape and appearance features for human pose estimation

This paper presents a method for combining the shape and appearance feature types in a discriminative learning framework for human pose estimation. We first present a new appearance descriptor that is distinctive and resilient to noise for 3D human pose estimation. We then combine the proposed appearance descriptor with a shape descriptor computed from the silhouette of the human subject using discriminative learning. Our method, which we refer to as a localized decision level fusion technique, is based on clustering the output pose space into several partitions and learning a decision level fusion model for the shape and appearance descriptors in each region. The combined shape and appearance descriptor allows complementary information of the individual feature types to be exploited, leading to improved performance of the pose estimation system. We evaluate our proposed fusion method with feature level fusion and kernel level fusion methods using a synchronized video and 3D motion dataset. Our experimental results show that the proposed feature combination method gives more accurate pose estimation than the one obtained from each individual feature type. Among the three fusion methods, our localized decision level fusion method is demonstrated to perform the best for 3D pose estimation.

[1]  David J. Fleet,et al.  Priors for people tracking from small training sets , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2]  Andrew W. Fitzgibbon,et al.  The Joint Manifold Model for Semi-supervised Multi-valued Regression , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Ankur Agarwal,et al.  Monocular Human Motion Capture with a Mixture of Regressors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[4]  Arun Ross,et al.  Information fusion in biometrics , 2003, Pattern Recognit. Lett..

[5]  Dimitris N. Metaxas,et al.  Learning Ambiguities Using Bayesian Mixture of Experts , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[6]  Pau-Choo Chung,et al.  Contrast Context Histogram - A Discriminating Local Descriptor for Image Matching , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[7]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[9]  Stéphane Canu,et al.  Kernel Basis Pursuit , 2005, ECML.

[10]  L. Davis,et al.  Background and foreground modeling using nonparametric kernel density estimation for visual surveillance , 2002, Proc. IEEE.

[11]  Yihong Gong,et al.  Discriminative learning of visual words for 3D human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[13]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Ramakant Nevatia,et al.  Multiple pose context trees for estimating human pose in object context , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[16]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Stefano Soatto,et al.  Fast Human Pose Estimation using Appearance and Motion via Multi-Dimensional Boosting Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[19]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[20]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  James J. Little,et al.  Simultaneous Tracking and Action Recognition using the PCA-HOG Descriptor , 2006, The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).

[22]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[25]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[26]  Weiwei Guo,et al.  Discriminative 3D human pose estimation from monocular images via topological preserving hierarchical affinity clustering , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[27]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[28]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[29]  Ramakant Nevatia,et al.  Human Pose Tracking in Monocular Sequence Using Multilevel Structured Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Gunnar Rätsch,et al.  The SHOGUN Machine Learning Toolbox , 2010, J. Mach. Learn. Res..

[31]  Rómer Rosales,et al.  Combining Generative and Discriminative Models in a Framework for Articulated Pose Estimation , 2006, International Journal of Computer Vision.

[32]  Hao Jiang Human pose estimation using consistent max-covering , 2009, ICCV.

[33]  Ian D. Reid,et al.  Articulated Body Motion Capture by Stochastic Search , 2005, International Journal of Computer Vision.

[34]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[35]  Xiaoqin Zhang,et al.  Efficient human pose estimation via parsing a tree structure based human model , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Ahmed M. Elgammal,et al.  Coupled Visual and Kinematic Manifold Models for Tracking , 2010, International Journal of Computer Vision.

[37]  David J. Fleet,et al.  Physics-Based Person Tracking Using the Anthropomorphic Walker , 2010, International Journal of Computer Vision.

[38]  Mohammed Bennamoun,et al.  Localized fusion of Shape and Appearance features for 3D Human Pose Estimation , 2010, BMVC.

[39]  Mohammed Bennamoun,et al.  Supervised particle filter for tracking 2D human pose in monocular video , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[40]  Cristian Sminchisescu,et al.  Fast algorithms for large scale conditional 3D prediction , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  Mohammed Bennamoun,et al.  Context-Based Appearance Descriptor for 3D Human Pose Estimation from Monocular Images , 2009, 2009 Digital Image Computing: Techniques and Applications.

[44]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .