A Joint Model for 2D and 3D Pose Estimation from a Single Image

We introduce a novel approach to automatically recover 3D human pose from a single image. Most previous work follows a pipelined approach: initially, a set of 2D features such as edges, joints or silhouettes are detected in the image, and then these observations are used to infer the 3D pose. Solving these two problems separately may lead to erroneous 3D poses when the feature detector has performed poorly. In this paper, we address this issue by jointly solving both the 2D detection and the 3D inference problems. For this purpose, we propose a Bayesian framework that integrates a generative model based on latent variables and discriminative 2D part detectors based on HOGs, and perform inference using evolutionary algorithms. Real experimentation demonstrates competitive results, and the ability of our methodology to provide accurate 2D and 3D pose estimations even when the 2D detectors are inaccurate.

[1]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[2]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[3]  Stan Sclaroff,et al.  Fast globally optimal 2D human detection with loopy graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Francesc Moreno-Noguer,et al.  Exploring Ambiguities for Monocular Non-rigid Shape Estimation , 2010, ECCV.

[5]  Cristian Sminchisescu,et al.  Generative modeling for continuous non-linearly embedded visual inference , 2004, ICML.

[6]  Stefano Soatto,et al.  Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images , 2008, ECCV.

[7]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[8]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Neil D. Lawrence,et al.  Hierarchical Gaussian process latent variable models , 2007, ICML '07.

[10]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[11]  Ramakant Nevatia,et al.  Efficient Inference with Multiple Heterogeneous Part Detectors for Human Pose Estimation , 2010, ECCV.

[12]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[13]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[14]  Liang Huang,et al.  Advanced Dynamic Programming in Semiring and Hypergraph Frameworks , 2008, COLING.

[15]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[17]  Philip H. S. Torr,et al.  Randomized trees for human pose detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Francesc Moreno-Noguer,et al.  Stochastic Exploration of Ambiguities for Nonrigid Shape Recovery , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Xianghua Xie,et al.  Tracking 3D human pose with large root node uncertainty , 2011, CVPR 2011.

[21]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Yun Fu,et al.  Human Motion Tracking by Temporal-Spatial Local Gaussian Process Experts , 2011, IEEE Transactions on Image Processing.

[23]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[24]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  David J. Fleet,et al.  Shared Kernel Information Embedding for Discriminative Inference , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Bernt Schiele,et al.  Discriminative Appearance Models for Pictorial Structures , 2011, International Journal of Computer Vision.

[27]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[28]  Francesc Moreno-Noguer,et al.  Single image 3D human pose estimation from noisy observations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Raquel Urtasun,et al.  Combining discriminative and generative methods for 3D deformable surface and articulated pose reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.