Semi-supervised Learning of Joint Density Models for Human Pose Estimation

Learning regression models (for example for body pose estimation, or BPE) currently requires large numbers of training examples—pairs of the form (image, pose parameters). These examples are difficult to obtain for many problems, demanding considerable effort in manual labelling. However it is easy to obtain unlabelled examples—in BPE, simply by collecting many images, and by sampling many poses using motion capture. We show how the use of unlabelled examples can improve the performance of such estimators, making better use of the difficult-to-obtain training examples. Because the distribution of parameters conditioned on a given image is often multimodal, conventional regression models must be extended to allow for multiple modes. Such extensions have to date had a pre-set number of modes, independent of the contents of the input image, and amount to fitting several regressors simultaneously. Our framework models instead the joint distribution of images and poses, so the conditional estimates are inherently multimodal, and the number of modes is a function of the joint-space complexity, rather than of the maximum number of output modes. We demonstrate the improvements obtainable by using unlabelled samples on synthetic examples and on a real pose estimation problem, and demonstrate in both cases the additional accuracy provided by the use of unlabelled data.

[1]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[2]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[3]  Michael J. Black,et al.  Gibbs likelihoods for Bayesian tracking , 2004, CVPR 2004.

[4]  Björn Stenger,et al.  Multivariate Relevance Vector Machines for Tracking , 2006, ECCV.

[5]  K. Rohr Towards model-based recognition of human movements in image sequences , 1994 .

[6]  Nikos A. Vlassis,et al.  Semi-supervised learning with gaussian fields , 2005 .

[7]  Ankur Agarwal,et al.  3D human pose from silhouettes by relevance vector regression , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8]  Ankur Agarwal,et al.  Monocular Human Motion Capture with a Mixture of Regressors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[9]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[10]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[11]  Andrew Blake,et al.  A sparse probabilistic learning algorithm for real-time tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  D. Weinshall,et al.  Computing Gaussian Mixture Models with EM using Side-Information , 2003 .

[16]  Nebojsa Jojic,et al.  Tracking self-occluding articulated objects in dense disparity maps , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Zhi-Hua Zhou,et al.  Semi-Supervised Regression with Co-Training , 2005, IJCAI.