Learning-Based Facial Rearticulation Using Streams of 3D Scans

In this paper, we present a new approach that generates synthetic mouth articulations from an audio file and that transfers them to different face meshes. It is based on learning articulations from a stream of 3D scans of a real person acquired by a structured light scanner at 40 three-dimensional frames per second. Correspondence between these scans over several speech sequences is established via optical flow. We propose a novel type of Principal Component Analysis that considers variances only in a sub-region of the face, while retaining the full dimensionality of the original vector space of sample scans. Audio is recorded at the same time, so the head scans can be synchronized with phoneme and viseme information for computing viseme clusters. Given a new audio sequence along with text data, we are able to quickly create in a fully automated fashion an animation synchronized with that new sentence by morphing between the visemes along a path in viseme-space. The methods described in the paper include an automated process for data analysis in streams of 3D scans, and a framework that connects the system to existing static face modeling technology for articulation transfer.

[1]  Baining Guo,et al.  Geometry-driven photorealistic facial expression synthesis , 2003, IEEE Transactions on Visualization and Computer Graphics.

[2]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[3]  Athinodoros S. Georghiades,et al.  Recovering 3-D Shape and Reflectance From a Small Number of Photographs , 2003, Rendering Techniques.

[4]  Jing Xiao,et al.  Vision-based control of 3D facial animation , 2003, SCA '03.

[5]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[6]  Tomaso Poggio,et al.  Trainable Videorealistic Speech Animation , 2004, FGR.

[7]  Sung Yong Shin,et al.  An example-based approach for facial expression cloning , 2003, SCA '03.

[8]  K. Nechvíle The High Resolution , 2005 .

[9]  David Salesin,et al.  Resynthesizing facial animation through 3D model-based tracking , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[10]  Steven M. Seitz,et al.  Spacetime faces , 2004, ACM Trans. Graph..

[11]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[12]  Linda G. Shapiro,et al.  Computer and Robot Vision (Volume II) , 2002 .

[13]  Scott A. King,et al.  Issues with lip sync animation: can you read my lips? , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[14]  Song Zhang,et al.  High-Resolution, Real-time 3D Shape Acquisition , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[15]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, SIGGRAPH 2005.

[16]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[17]  Luc J. Van Gool,et al.  A Visual Speech Generator , 2003, IS&T/SPIE Electronic Imaging.

[18]  Tomaso A. Poggio,et al.  Reanimating Faces in Images and Video , 2003, Comput. Graph. Forum.

[19]  Frédéric H. Pighin,et al.  Unsupervised learning for speech motion editing , 2003, SCA '03.

[20]  Linda G. Shapiro,et al.  Computer and Robot Vision , 1991 .

[21]  Henrique S. Malvar,et al.  Making Faces , 2019, Topoi.

[22]  Jun-yong Noh,et al.  Expression cloning , 2001, SIGGRAPH 2001.

[23]  Tony Ezzat,et al.  Visual Speech Synthesis by Morphing Visemes , 2000, International Journal of Computer Vision.

[24]  Ronald Fedkiw,et al.  Automatic determination of facial muscle activations from sparse motion capture marker data , 2005, ACM Trans. Graph..