3D Human Motion Tracking with a Coordinated Mixture of Factor Analyzers

A major challenge in applying Bayesian tracking methods for tracking 3D human body pose is the high dimensionality of the pose state space. It has been observed that the 3D human body pose parameters typically can be assumed to lie on a low-dimensional manifold embedded in the high-dimensional space. The goal of this work is to approximate the low-dimensional manifold so that a low-dimensional state vector can be obtained for efficient and effective Bayesian tracking. To achieve this goal, a globally coordinated mixture of factor analyzers is learned from motion capture data. Each factor analyzer in the mixture is a “locally linear dimensionality reducer” that approximates a part of the manifold. The global parametrization of the manifold is obtained by aligning these locally linear pieces in a global coordinate system. To enable automatic and optimal selection of the number of factor analyzers and the dimensionality of the manifold, a variational Bayesian formulation of the globally coordinated mixture of factor analyzers is proposed. The advantages of the proposed model are demonstrated in a multiple hypothesis tracker for tracking 3D human body pose. Quantitative comparisons on benchmark datasets show that the proposed method produces more accurate 3D pose estimates over time than those obtained from two previously proposed Bayesian tracking methods.

[1]  Rui Li,et al.  Monocular Tracking of 3D Human Motion with a Coordinated Mixture of Factor Analyzers , 2006, ECCV.

[2]  Rui Li,et al.  Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  David A. Forsyth,et al.  How Does CONDENSATION Behave with a Finite Number of Samples? , 2000, ECCV.

[4]  A. Gámez,et al.  Nonlinear dimensionality reduction in climate data , 2004 .

[5]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[6]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[7]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[8]  Zoubin Ghahramani,et al.  Variable Noise and Dimensionality Reduction for Sparse Gaussian processes , 2006, UAI.

[9]  Ahmed M. Elgammal,et al.  Tracking People on a Torus , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  David J. Fleet,et al.  Stochastic Tracking of 3 D Human Figures Using 2 D Image Motion , 2000 .

[11]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  James O. Berger,et al.  Ockham's razor and Bayesian analysis. [statistical theory for systems evaluation] , 1992 .

[13]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[14]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[15]  John Skilling,et al.  Maximum Entropy and Bayesian Methods , 1989 .

[16]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[17]  B. Triggs,et al.  Tracking Articulated Motion with Piecewise Learned Dynamical Models , 2004 .

[18]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, SIGGRAPH 2004.

[19]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[20]  Andrew Blake,et al.  A Probabilistic Exclusion Principle for Tracking Multiple Objects , 2004, International Journal of Computer Vision.

[21]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[22]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[23]  David J. Fleet,et al.  Priors for people tracking from small training sets , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[24]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[25]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[26]  Aaron Hertzmann,et al.  Style-based inverse kinematics , 2004, ACM Trans. Graph..

[27]  Cristian Sminchisescu,et al.  Covariance scaled sampling for monocular 3D body tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[28]  Rui Li,et al.  Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[29]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[30]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[31]  Sidharth Bhatia,et al.  Tracking loose-limbed people , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[32]  Narendra Ahuja,et al.  Learning Nonlinear Manifolds from Time Series , 2006, ECCV.

[33]  Björn Stenger,et al.  Filtering using a tree-based estimator , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[34]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[35]  Sergey Ioffe,et al.  Human tracking with mixtures of trees , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[36]  David J. Fleet,et al.  People tracking using hybrid Monte Carlo filtering , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[37]  D. Huttenlocher,et al.  A unified spatio-temporal articulated model for tracking , 2004, CVPR 2004.

[38]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[39]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[41]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[42]  Michael Isard,et al.  Tracking loose-limbed people , 2004, CVPR 2004.

[43]  Q. Shi,et al.  Gaussian Process Latent Variable Models for , 2011 .

[44]  Jakob J. Verbeek,et al.  Learning nonlinear image manifolds by global alignment of local linear models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  David J. Fleet,et al.  Topologically-constrained latent variable models , 2008, ICML '08.

[46]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[47]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[48]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[49]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[50]  Matthew Brand,et al.  Charting a Manifold , 2002, NIPS.

[51]  Geoffrey E. Hinton,et al.  Global Coordination of Local Linear Models , 2001, NIPS.

[52]  Ronald Poppe,et al.  Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets , 2007 .

[53]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[54]  Andrew Blake,et al.  A Probabilistic Exclusion Principle for Tracking Multiple Objects , 2000, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[55]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[56]  Yee Whye Teh,et al.  Automatic Alignment of Local Representations , 2002, NIPS.

[57]  S. Sclaroff,et al.  Tracking Human Body Pose on a Learned Smooth Space , 2005 .

[58]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[59]  Tieniu Tan,et al.  Recent developments in human motion analysis , 2003, Pattern Recognit..

[60]  Maja J. Mataric,et al.  A spatio-temporal extension to Isomap nonlinear dimension reduction , 2004, ICML.

[61]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[62]  Michael J. Black,et al.  A Quantitative Evaluation of Video-based 3D Person Tracking , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[63]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[64]  Cristian Sminchisescu,et al.  Generative modeling for continuous non-linearly embedded visual inference , 2004, ICML.