Efficient initialization of Mixtures of Experts for human pose estimation

This paper addresses the problem of recovering 3D human pose from a single monocular image. In the literature, Bayesian Mixtures of Experts (BME) was successfully used to represent the multimodal image-to-pose distributions. However, the expectation-maximization (EM) algorithm that learns the BME model may converge to a suboptimal local maximum. And the quality of the final solution depends largely on the initial values. In this paper, we propose an efficient initialization method for BME learning. We first partition the training set so that each subset can be well modeled by a single expert and the total regression error is minimized. Then each expert and gate of BME model is initialized on a partition subset. Our initialization method is tested on both a quasi-synthetic dataset and a real dataset (HumanEva). Results show that it greatly reduces the computational cost in training while improves testing accuracy.

[1]  Cristian Sminchisescu,et al.  BM³E : Discriminative Density Propagation for Visual Tracking , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[4]  Stefano Soatto,et al.  Fast Human Pose Estimation using Appearance and Motion via Multi-Dimensional Boosting Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[6]  AgarwalAnkur,et al.  Recovering 3D Human Pose from Monocular Images , 2006 .

[7]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[8]  A. Fathi,et al.  Human Pose Estimation using Motion Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[10]  Michael J. Black,et al.  Predicting 3D People from 2D Pictures , 2006, AMDO.

[11]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Dorin Comaniciu,et al.  Image based regression using boosting method , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Tieniu Tan,et al.  People tracking based on motion model and motion constraints with automatic initialization , 2004, Pattern Recognit..

[14]  Gang Qian,et al.  Learning and Inference of 3D Human Poses from Gaussian Mixture Modeled Silhouettes , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[15]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Cristian Sminchisescu,et al.  Semi-supervised Hierarchical Models for 3D Human Pose Reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Rómer Rosales,et al.  Learning Body Pose via Specialized Maps , 2001, NIPS.