Dense 3D motion capture for human faces

This paper proposes a novel approach to motion capture from multiple, synchronized video streams, specifically aimed at recording dense and accurate models of the structure and motion of highly deformable surfaces such as skin, that stretches, shrinks, and shears in the midst of normal facial expressions. Solving this problem is a key step toward effective performance capture for the entertainment industry, but progress so far has been hampered by the lack of appropriate local motion and smoothness models. The main technical contribution of this paper is a novel approach to regularization adapted to nonrigid tangential deformations. Concretely, we estimate the nonrigid deformation parameters at each vertex of a surface mesh, smooth them over a local neighborhood for robustness, and use them to regularize the tangential motion estimation. To demonstrate the power of the proposed approach, we have integrated it into our previous work for markerless motion capture [9], and compared the performances of the original and new algorithms on three extremely challenging face datasets that include highly nonrigid skin deformations, wrinkles, and quickly changing expressions. Additional experiments with a dataset featuring fast-moving cloth with complex and evolving fold structures demonstrate that the adaptability of the proposed regularization scheme to nonrigid tangential motion does not hamper its robustness, since it successfully recovers the shape and motion of the cloth without overfitting it despite the absence of stretch or shear in this case.

[1]  Olivier D. Faugeras,et al.  Multi-View Stereo Reconstruction and Scene Flow Estimation with a Global Image-Based Matching Score , 2007, International Journal of Computer Vision.

[2]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[3]  Li Zhang,et al.  Spacetime faces: high resolution capture for modeling and animation , 2004, SIGGRAPH 2004.

[4]  Yiannis Aloimonos,et al.  Spatio-Temporal Stereo Using Multi-Resolution Subdivision Surfaces , 2004, International Journal of Computer Vision.

[5]  Jean Ponce,et al.  Dense 3D motion capture from synchronized video streams , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[7]  Katsushi Ikeuchi,et al.  Shape representation and image segmentation using deformable surfaces , 1992, Image Vis. Comput..

[8]  Gérard Bailly,et al.  Shape and appearance models of talking faces for model-based tracking , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[9]  Radu Horaud,et al.  Temporal Surface Tracking Using Mesh Evolution , 2008, ECCV.

[10]  Björn Stenger,et al.  Non-rigid Photometric Stereo with Colored Lights , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[12]  Takeo Kanade,et al.  Image-based spatio-temporal modeling and view interpolation of dynamic events , 2005, TOGS.

[13]  Rui Li,et al.  Multi-Scale 3D Scene Flow from Binocular Stereo Sequences , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[14]  Kiriakos N. Kutulakos,et al.  Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance , 2002, International Journal of Computer Vision.

[15]  M. Otaduy,et al.  Multi-scale capture of facial geometry and motion , 2007, ACM Trans. Graph..

[16]  Seth J. Teller,et al.  Particle Video: Long-Range Motion Estimation Using Point Trajectories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Jing Xiao,et al.  Multi-view AAM fitting and camera calibration , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[19]  Jessica K. Hodgins,et al.  Capturing and animating skin deformation in human motion , 2006, SIGGRAPH '06.

[20]  David A. Forsyth,et al.  Capturing and animating occluded cloth , 2007, ACM Trans. Graph..