Facial Performance Transfer via Deformable Models and Parametric Correspondence

The issue of transferring facial performance from one person's face to another's has been an area of interest for the movie industry and the computer graphics community for quite some time. In recent years, deformable face models, such as the Active Appearance Model (AAM), have made it possible to track and synthesize faces in real time. Not surprisingly, deformable face model-based approaches for facial performance transfer have gained tremendous interest in the computer vision and graphics community. In this paper, we focus on the problem of real-time facial performance transfer using the AAM framework. We propose a novel approach of learning the mapping between the parameters of two completely independent AAMs, using them to facilitate the facial performance transfer in a more realistic manner than previous approaches. The main advantage of modeling this parametric correspondence is that it allows a "meaningful” transfer of both the nonrigid shape and texture across faces irrespective of the speakers' gender, shape, and size of the faces, and illumination conditions. We explore linear and nonlinear methods for modeling the parametric correspondence between the AAMs and show that the sparse linear regression method performs the best. Moreover, we show the utility of the proposed framework for a cross-language facial performance transfer that is an area of interest for the movie dubbing industry.

[1]  Jun-yong Noh,et al.  Expression cloning , 2001, SIGGRAPH.

[2]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[3]  Neil D. Lawrence,et al.  Sparse Convolved Gaussian Processes for Multi-output Regression , 2008, NIPS.

[4]  Timothy F. Cootes,et al.  Interpreting face images using active appearance models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[5]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[6]  Jovan Popovic,et al.  Deformation transfer for triangle meshes , 2004, ACM Trans. Graph..

[7]  Roland Göcke,et al.  Facial Expression Based Automatic Album Creation , 2010, ICONIP.

[8]  Tony Ezzat,et al.  Transferable videorealistic speech animation , 2005, SCA '05.

[9]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, ACM Trans. Graph..

[10]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[11]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[12]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[13]  Sami Romdhani,et al.  Face image analysis using a multiple features fitting strategy , 2005 .

[14]  Zicheng Liu,et al.  Expressive expression mapping with ratio images , 2001, SIGGRAPH.

[15]  Dani Lischinski,et al.  Data-driven enhancement of facial attractiveness , 2008, ACM Trans. Graph..

[16]  Tony Jan,et al.  An adjustable model for linear to nonlinear regression , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[17]  Roland Göcke,et al.  A SSIM-based approach for finding similar facial expressions , 2011, Face and Gesture 2011.

[18]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[19]  Michael Jones,et al.  Multidimensional Morphable Models: A Framework for Representing and Matching Object Classes , 2004, International Journal of Computer Vision.

[20]  P. Vácha Texture Similarity Measure , 2005 .

[21]  Barry-John Theobald,et al.  Real-time expression cloning using appearance models , 2007, ICMI '07.

[22]  Marcus R. Frean,et al.  Dependent Gaussian Processes , 2004, NIPS.

[23]  Tomaso A. Poggio,et al.  Reanimating Faces in Images and Video , 2003, Comput. Graph. Forum.

[24]  Roland Göcke,et al.  The audio-video australian English speech data corpus AVOZES , 2012, INTERSPEECH.

[25]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.