Monocular 3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis

We propose a new statistical approach to human motion modeling and tracking that utilizes probabilistic latent semantic (PLSA) models to describe the mapping of image features to 3D human pose estimates. PLSA has been successfully used to model the co-occurrence of dyadic data on problems such as image annotation where image features are mapped to word categories via latent variable semantics. We apply the PLSA approach to motion tracking by extending it to a sequential setting where the latent variables describe intrinsic motion semantics linking human figure appearance to 3D pose estimates. This approach is in contrast to many current methods that directly learn the often high-dimensional image-to-pose mappings and utilize subspace projections as a constraint on the pose space alone. As a consequence, such mappings may often exhibit increased computational complexity and insufficient generalization performance. We demonstrate the utility of the proposed model on the synthetic dataset and the task of 3D human motion tracking in monocular image sequences with arbitrary camera views. Our experiments show that the dynamic PLSA approach can produce accurate pose estimates at a fraction of the computational cost of alternative subspace tracking methods.

[1]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[2]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[4]  Cristian Sminchisescu,et al.  Generative modeling for continuous non-linearly embedded visual inference , 2004, ICML.

[5]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[6]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  David J. Fleet,et al.  Priors for people tracking from small training sets , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[9]  Cristian Sminchisescu,et al.  Conditional Visual Tracking in Kernel Space , 2005, NIPS.

[10]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[11]  Jerome R. Bellegarda,et al.  Exploiting both local and global constraints for multi-span statistical language modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Vladimir Pavlovic,et al.  Impact of Dynamics on Subspace Embedding and Tracking of Sequences , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  D. Huttenlocher,et al.  A unified spatio-temporal articulated model for tracking , 2004, CVPR 2004.

[14]  David J. Fleet,et al.  Monocular 3-D Tracking of the Golf Swing , 2005, CVPR.

[15]  Rui Li,et al.  Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[16]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[17]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[18]  Michael J. Black,et al.  Automatic Detection and Tracking of Human Motion with a View-Based Representation , 2002, ECCV.

[19]  Rajesh P. N. Rao,et al.  Learning Shared Latent Structure for Image Synthesis and Robotic Imitation , 2005, NIPS.

[20]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[21]  Michael Isard,et al.  Tracking loose-limbed people , 2004, CVPR 2004.

[22]  Ahmed M. Elgammal,et al.  Simultaneous Inference of View and Body Pose using Torus Manifolds , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[23]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.