Homeomorphic Manifold Analysis (HMA): Generalized separation of style and content on manifolds

The problem of separation of style and content is an essential element of visual perception, and is a fundamental mystery of perception. This problem appears extensively in different computer vision applications. The problem we address in this paper is the separation of style and content when the content lies on a low-dimensional nonlinear manifold representing a dynamic object. We show that such a setting appears in many human motion analysis problems. We introduce a framework for learning parameterization of style and content in such settings. Given a set of topologically equivalent manifolds, the Homeomorphic Manifold Analysis (HMA) framework models the variation in their geometries in the space of functions that maps between a topologically-equivalent common representation and each of them. The framework is based on decomposing the style parameters in the space of nonlinear functions that map between a unified embedded representation of the content manifold and style-dependent visual observations. We show the application of the framework in synthesis, recognition, and tracking of certain human motions that follow this setting, such as gait and facial expressions.

[1]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[2]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[3]  B A Wandell,et al.  Linear models of surface and illuminant spectra. , 1992, Journal of the Optical Society of America. A, Optics and image science.

[4]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[5]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[6]  Ralph Gross,et al.  The CMU Motion of Body (MoBo) Database , 2001 .

[7]  Emiliano Gambaretto,et al.  Markerless Motion Capture through Visual Hull, Articulated ICP and Subject Specific Model Generation , 2010, International Journal of Computer Vision.

[8]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[9]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[10]  Trevor Darrell,et al.  On modelling nonlinear shape-and-texture appearance manifolds , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[12]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[13]  VandewalleJoos,et al.  On the Best Rank-1 and Rank-(R1,R2,. . .,RN) Approximation of Higher-Order Tensors , 2000 .

[14]  Ahmed M. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  Trevor Darrell,et al.  Learning appearance manifolds from video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[17]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[18]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[19]  M. Alex O. Vasilescu Human motion signatures: analysis, synthesis, recognition , 2002, Object recognition supported by user interaction for service robots.

[20]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[21]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[22]  Amnon Shashua,et al.  Linear image coding for regression and classification using the tensor-rank principle , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[23]  Amnon Shashua,et al.  Principal Component Analysis over Continuous Subspaces and Intersection of Half-Spaces , 2002, ECCV.

[24]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Michael J. Black,et al.  Automatic Detection and Tracking of Human Motion with a View-Based Representation , 2002, ECCV.

[26]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[27]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[28]  Kun Huang,et al.  A unifying theorem for spectral embedding and clustering , 2003, AISTATS.

[29]  Hans-Peter Seidel,et al.  Optimization and Filtering for Human Motion Capture , 2010, International Journal of Computer Vision.

[30]  Trevor Hastie,et al.  Learning and Tracking Human Motion Using Functional Analysis , 2000 .

[31]  M. Trivedi,et al.  Articulated Human Body Pose Inference from Voxel Data Using a Kinematically Constrained Gaussian Mixture Model , 2007 .

[32]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[33]  Anand Rangarajan,et al.  A new algorithm for non-rigid point matching , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[34]  Baoxin Li,et al.  Learning Motion Correlation for Tracking Articulated Human Body with a Rao-Blackwellised Particle Filter , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[35]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[36]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[37]  Vladimir Pavlovic,et al.  Impact of Dynamics on Subspace Embedding and Tracking of Sequences , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[38]  Ahmed M. Elgammal,et al.  Tracking People on a Torus , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  H. Sebastian Seung,et al.  The Manifold Ways of Perception , 2000, Science.

[40]  Ahmed M. Elgammal,et al.  Style Adaptive Bayesian Tracking Using Explicit Manifold Learning , 2005, BMVC.

[41]  David J. Fleet,et al.  Physics-Based Person Tracking Using the Anthropomorphic Walker , 2010, International Journal of Computer Vision.

[42]  Ahmed M. Elgammal,et al.  Learning to track: conceptual manifold map for closed-form tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  Ahmed M. Elgammal,et al.  Modeling View and Posture Manifolds for Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[44]  Matthew Brand,et al.  Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[45]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[46]  Joshua B. Tenenbaum,et al.  Mapping a Manifold of Perceptual Observations , 1997, NIPS.

[47]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[48]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[49]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[50]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[51]  R. Bowden Learning Statistical Models of Human Motion , 2000 .

[52]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[53]  Ahmed M. Elgammal,et al.  Learning a Joint Manifold Representation from Multiple Data Sets , 2010, 2010 20th International Conference on Pattern Recognition.

[54]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[55]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[56]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[57]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[58]  Ahmed M. Elgammal,et al.  Nonlinear manifold learning for dynamic shape and dynamic appearance , 2007, Comput. Vis. Image Underst..

[59]  David J. Fleet,et al.  Priors for people tracking from small training sets , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[60]  H. Neudecker,et al.  An approach ton-mode components analysis , 1986 .

[61]  Ahmed M. Elgammal,et al.  Putting local features on a manifold , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[62]  Stephen M. Omohundro,et al.  Nonlinear manifold learning for visual speech recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.

[63]  Cristian Sminchisescu,et al.  Generative modeling for continuous non-linearly embedded visual inference , 2004, ICML.

[64]  Ehud Rivlin,et al.  3D Human Body-Part Tracking and Action Classification Using A Hierarchical Body Model , 2009, BMVC.

[65]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[66]  Andrew Blake,et al.  Probabilistic tracking in a metric space , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[67]  S. Haykin Kalman Filtering and Neural Networks , 2001 .

[68]  Rui Li,et al.  Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[69]  Octavia I. Camps,et al.  Modeling Correspondences for Multi-Camera Tracking Using Nonlinear Manifold Learning and Target Dynamics , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).