Latent Gaussian Mixture Regression for Human Pose Estimation

Discriminative approaches for human pose estimation model the functional mapping, or conditional distribution, between image features and 3D pose. Learning such multi-modal models in high dimensional spaces, however, is challenging with limited training data; often resulting in over-fitting and poor generalization. To address these issues latent variable models (LVMs) have been introduced. Shared LVMs attempt to learn a coherent, typically non-linear, latent space shared by image features and 3D poses, distribution of data in that latent space, and conditional distributions to and from this latent space to carry out inference. Discovering the shared manifold structure can, in itself, however, be challenging. In addition, shared LVMs models are most often non-parametric, requiring the model representation to be a function of the training set size. We present a parametric framework that addresses these shortcoming. In particular, we learn latent spaces, and distributions within them, for image features and 3D poses separately first, and then learn a multi-modal conditional density between these two lowdimensional spaces in the form of Gaussian Mixture Regression. Using our model we can address the issue of over-fitting and generalization, since the data is denser in the learned latent space, as well as avoid the necessity of learning a shared manifold for the data. We quantitatively evaluate and compare the performance of the proposed method to several state-of-the-art alternatives, and show that our method gives a competitive performance.

[1]  Cristian Sminchisescu,et al.  Covariance scaled sampling for monocular 3D body tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  Rui Li,et al.  Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[3]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[6]  Liefeng Bo,et al.  Structured output-associative regression , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Stefano Soatto,et al.  Fast Human Pose Estimation using Appearance and Motion via Multi-Dimensional Boosting Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Luc Van Gool,et al.  Learning Generative Models for Multi-Activity Body Pose Estimation , 2008, International Journal of Computer Vision.

[9]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Human Pose Estimation , 2007, MLMI.

[10]  Robert M. Thorndike,et al.  Canonical Correlation Analysis , 2000 .

[11]  A. Fathi,et al.  Human Pose Estimation using Motion Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[13]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[14]  Ahmed M. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  Cristian Sminchisescu,et al.  Spectral Latent Variable Models for Perceptual Inference , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Andrew W. Fitzgibbon,et al.  The Joint Manifold Model for Semi-supervised Multi-valued Regression , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Christopher M. Bishop,et al.  Non-linear Bayesian Image Modelling , 2000, ECCV.

[18]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Andrei Popescu-Belis,et al.  Machine Learning for Multimodal Interaction , 4th International Workshop, MLMI 2007, Brno, Czech Republic, June 28-30, 2007, Revised Selected Papers , 2008, MLMI.

[20]  Gang Qian,et al.  Learning and Inference of 3D Human Poses from Gaussian Mixture Modeled Silhouettes , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[21]  E. Nadaraya On Estimating Regression , 1964 .

[22]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[23]  Cristian Sminchisescu,et al.  Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Ankur Agarwal,et al.  3D human pose from silhouettes by relevance vector regression , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[25]  Ankur Agarwal,et al.  Monocular Human Motion Capture with a Mixture of Regressors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[26]  Dimitris N. Metaxas,et al.  Learning to Reconstruct 3 D Human Motion from Bayesian Mixtures of Experts . A Probabilistic Discriminative Approach , 2004 .

[27]  David J. Fleet,et al.  Shared Kernel Information Embedding for discriminative inference , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.