Coupling Top-down and Bottom-up Methods for 3D Human Pose and Shape Estimation from Monocular Image Sequences

Until recently Intelligence, Surveillance, and Reconnaissance (ISR) focused on acquiring behavioral information of the targets and their activities. Continuous evolution of intelligence being gathered of the human centric activities has put increased focus on the humans, especially inferring their innate characteristics - size, shapes and physiology. These bio-signatures extracted from the surveillance sensors can be used to deduce age, ethnicity, gender and actions, and further characterize human actions in unseen scenarios. However, recovery of pose and shape of humans in such monocular videos is inherently an ill-posed problem, marked by frequent depth and view based ambiguities due to self-occlusion, foreshortening and misalignment. The likelihood function often yields a highly multimodal posterior that is difficult to propagate even using the most advanced particle filtering(PF) algorithms. Motivated by the recent success of the discriminative approaches to efficiently predict 3D poses directly from the 2D images, we present several principled approaches to integrate predictive cues using learned regression models to sustain multimodality of the posterior during tracking. Additionally, these learned priors can be actively adapted to the test data using a likelihood based feedback mechanism. Estimated 3D poses are then used to fit 3D human shape model to each frame independently for inferring anthropometric bio-signatures. The proposed system is fully automated, robust to noisy test data and has ability to swiftly recover from tracking failures even after confronting with significant errors. We evaluate the system on a large number of monocular human motion sequences.

[1]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[2]  Rómer Rosales,et al.  Learning Body Pose via Specialized Maps , 2001, NIPS.

[3]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Patrick Pérez,et al.  Maintaining multimodality through mixture tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  A. Doucet,et al.  A Tutorial on Particle Filtering and Smoothing: Fifteen years later , 2008 .

[6]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[7]  Cristian Sminchisescu,et al.  Spectral Latent Variable Models for Perceptual Inference , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[9]  Michael Isard,et al.  ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[10]  Raquel Urtasun,et al.  Combining discriminative and generative methods for 3D deformable surface and articulated pose reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Cristian Sminchisescu,et al.  Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Mun Wai Lee,et al.  Proposal maps driven MCMC for estimating human body pose in static images , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13]  Kathleen M. Robinette,et al.  Civilian American and European Surface Anthropometry Resource (CAESAR). Volume 2: Descriptions , 2002 .

[14]  Ramakant Nevatia,et al.  Human Pose Tracking in Monocular Sequence Using Multilevel Structured Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[16]  Kristine L. Bell,et al.  A Tutorial on Particle Filters for Online Nonlinear/NonGaussian Bayesian Tracking , 2007 .

[17]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.