3D face tracking and expression inference from a 2D sequence using manifold learning

We propose a person-dependent, manifold-based approach for modeling and tracking rigid and nonrigid 3D facial deformations from a monocular video sequence. The rigid and nonrigid motions are analyzed simultaneously in 3D, by automatically fitting and tracking a set of landmarks. We do not represent all nonrigid facial deformations as a simple complex manifold, but instead decompose them on a basis of eight 1D manifolds. Each 1D manifold is learned offline from sequences of labeled expressions, such as smile, surprise, etc. Any expression is then a linear combination of values along these 8 axes, with coefficient representing the level of activation. We experimentally verify that expressions can indeed be represented this way, and that individual manifolds are indeed 1D. The manifold dimensionality estimation, manifold learning, and manifold traversal operation are all implemented in the N-D tensor voting framework. Using simple local operations, this framework gives an estimate of the tangent and normal spaces at every sample, and provides excellent robustness to noise and outliers. The output of our system, besides the tracked landmarks in 3D, is a labeled annotation of the expression. We demonstrate results on a number of challenging sequences.

[1]  Changbo Hu,et al.  Probabilistic expression analysis on manifolds , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[2]  Xavier Pennec,et al.  Intrinsic Statistics on Riemannian Manifolds: Basic Tools for Geometric Measurements , 2006, Journal of Mathematical Imaging and Vision.

[3]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[4]  Yuanzhong Li,et al.  Shape parameter optimization for Adaboosted active shape model , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5]  Serge J. Belongie,et al.  Learning to Traverse Image Manifolds , 2006, NIPS.

[6]  Takeo Kanade,et al.  3D Alignment of Face in a Single Image , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Gérard G. Medioni,et al.  Tensor Voting: A Perceptual Organization Approach to Computer Vision and Machine Learning , 2006, Tensor Voting.

[8]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[9]  Shihong Lao,et al.  Boosting nested cascade detector for multi-view face detection , 2004, ICPR 2004.

[10]  Yoshua Bengio,et al.  Nonlocal Estimation of Manifold Structure , 2006, Neural Computation.

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  Hiroshi Yasaka,et al.  1.3-V/sub pp/ push-pull drive InP Mach-Zehnder modulator module for 40 Gbit/s operation , 2005 .

[13]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[14]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[15]  Ahmed M. Elgammal,et al.  Learning to track: conceptual manifold map for closed-form tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Jing Xiao,et al.  Real-time combined 2D+3D active appearance models , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[17]  Zhiwei Zhu,et al.  Robust Real-Time Face Pose and Facial Expression Recovery , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[19]  Gérard G. Medioni,et al.  Unsupervised Dimensionality Estimation and Manifold Learning in high-dimensional Spaces by Tensor Voting , 2005, IJCAI.

[20]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[22]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[23]  Siome Goldenstein,et al.  The Best of Both Worlds: Combining 3D Deformable Models with Active Shape Models , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  Serge J. Belongie,et al.  Non-isometric manifold learning: analysis and an algorithm , 2007, ICML '07.

[25]  Ahmed M. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..