Learning Generative Models for Multi-Activity Body Pose Estimation

We present a method to simultaneously estimate 3D body pose and action categories from monocular video sequences. Our approach learns a generative model of the relationship of body pose and image appearance using a sparse kernel regressor. Body poses are modelled on a low-dimensional manifold obtained by Locally Linear Embedding dimensionality reduction. In addition, we learn a prior model of likely body poses and a dynamical model in this pose manifold. Sparse kernel regressors capture the nonlinearities of this mapping efficiently. Within a Recursive Bayesian Sampling framework, the potentially multimodal posterior probability distributions can then be inferred. An activity-switching mechanism based on learned transfer functions allows for inference of the performed activity class, along with the estimation of body pose and 2D image location of the subject. Using a rough foreground segmentation, we compare Binary PCA and distance transforms to encode the appearance. As a postprocessing step, the globally optimal trajectory through the entire sequence is estimated, yielding a single pose estimate per frame that is consistent throughout the sequence. We evaluate the algorithm on challenging sequences with subjects that are alternating between running and walking movements. Our experiments show how the dynamical model helps to track through poorly segmented low-resolution image sequences where tracking otherwise fails, while at the same time reliably classifying the activity type.

[1]  Mario Sznaier,et al.  Dynamic Appearance Modeling for Human Tracking , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Andrew W. Fitzgibbon,et al.  The Joint Manifold Model for Semi-supervised Multi-valued Regression , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[4]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[5]  Jakob J. Verbeek,et al.  Transformation invariant component analysis for binary images , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[7]  Sidharth Bhatia,et al.  Tracking loose-limbed people , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  David A. Forsyth,et al.  Computational Studies of Human Motion: Part 1, Tracking and Motion Synthesis , 2005, Found. Trends Comput. Graph. Vis..

[10]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[11]  Ankur Agarwal,et al.  Monocular Human Motion Capture with a Mixture of Regressors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[12]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[13]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Björn Stenger,et al.  Multivariate Relevance Vector Machines for Tracking , 2006, ECCV.

[15]  Ahmed M. Elgammal,et al.  Modeling View and Posture Manifolds for Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Rui Li,et al.  Monocular Tracking of 3D Human Motion with a Coordinated Mixture of Factor Analyzers , 2006, ECCV.

[17]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[18]  Rómer Rosales,et al.  Learning Body Pose via Specialized Maps , 2001, NIPS.

[19]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[20]  Michael Isard,et al.  Nonparametric belief propagation , 2010, Commun. ACM.

[21]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[22]  Michael Isard,et al.  PAMPAS: real-valued graphical models for computer vision , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[23]  Michael Isard,et al.  A mixed-state condensation tracker with automatic model-switching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[24]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  Luc Van Gool,et al.  Monocular Tracking with a Mixture of View-Dependent Learned Models , 2006, AMDO.

[26]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[27]  Ankur Agarwal,et al.  Tracking Articulated Motion Using a Mixture of Autoregressive Models , 2004, ECCV.

[28]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[30]  Rui Li,et al.  Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[32]  Simon J. Godsill,et al.  Monte Carlo filtering and smoothing with application to time-varying spectral estimation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[33]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[34]  Cristian Sminchisescu,et al.  Generative modeling for continuous non-linearly embedded visual inference , 2004, ICML.

[35]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[36]  Philip H. S. Torr,et al.  Regression-Based Human Motion Capture From Voxel Data , 2006, BMVC.

[37]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[38]  Donald G. Bailey,et al.  An Efficient Euclidean Distance Transform , 2004, IWCIA.

[39]  Niclas Wiberg,et al.  Codes and Decoding on General Graphs , 1996 .

[40]  B. Triggs,et al.  3D human pose from silhouettes by relevance vector regression , 2004, CVPR 2004.