Generative modeling for continuous non-linearly embedded visual inference

Many difficult visual perception problems, like 3D human motion estimation, can be formulated in terms of inference using complex generative models, defined over high-dimensional state spaces. Despite progress, optimizing such models is difficult because prior knowledge cannot be flexibly integrated in order to reshape an initially designed representation space. Nonlinearities, inherent sparsity of high-dimensional training sets, and lack of global continuity makes dimensionality reduction challenging and low-dimensional search inefficient. To address these problems, we present a learning and inference algorithm that restricts visual tracking to automatically extracted, non-linearly embedded, low-dimensional spaces. This formulation produces a layered generative model with reduced state representation, that can be estimated using efficient continuous optimization methods. Our prior flattening method allows a simple analytic treatment of low-dimensional intrinsic curvature constraints, and allows consistent interpolation operations. We analyze reduced manifolds for human interaction activities, and demonstrate that the algorithm learns continuous generative models that are useful for tracking and for the reconstruction of 3D human motion in monocular video.

[1]  David J. Fleet,et al.  People tracking using hybrid Monte Carlo filtering , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[2]  Michael J. Black,et al.  Implicit Probabilistic Models of Human Motion for Synthesis and Tracking , 2002, ECCV.

[3]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[4]  Cristian Sminchisescu,et al.  Kinematic jump processes for monocular 3D human tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  M. Levas OBBTree : A Hierarchical Structure for Rapid Interference Detection , .

[6]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[7]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, CVPR.

[8]  Andrew Blake,et al.  Probabilistic tracking in a metric space , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[9]  Geoffrey E. Hinton,et al.  A Mode-Hopping MCMC sampler , 2003 .

[10]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Qiang Wang,et al.  Learning object intrinsic structure for robust visual tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[13]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  C. Sminchisescu,et al.  Variational mixture smoothing for non-linear dynamical systems , 2004, CVPR 2004.

[16]  Stephen M. Omohundro,et al.  Nonlinear manifold learning for visual speech recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.

[17]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[19]  Dhairya Desai,et al.  Visual Speech Recognition , 2020 .

[20]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[21]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[22]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[23]  Yee Whye Teh,et al.  Automatic Alignment of Local Representations , 2002, NIPS.

[24]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[25]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.