FaceDirector: Continuous Control of Facial Performance in Video

We present a method to continuously blend between multiple facial performances of an actor, which can contain different facial expressions or emotional states. As an example, given sad and angry video takes of a scene, our method empowers the movie director to specify arbitrary weighted combinations and smooth transitions between the two takes in post-production. Our contributions include (1) a robust nonlinear audio-visual synchronization technique that exploits complementary properties of audio and visual cues to automatically determine robust, dense spatiotemporal correspondences between takes, and (2) a seamless facial blending approach that provides the director full control to interpolate timing, facial expression, and local appearance, in order to generate novel performances after filming. In contrast to most previous works, our approach operates entirely in image space, avoiding the need of 3D facial reconstruction. We demonstrate that our method can synthesize visually believable performances with applications in emotion transition, performance correction, and timing control.

[1]  Paul E. Debevec,et al.  Multiview face capture using polarized spherical gradient illumination , 2011, ACM Trans. Graph..

[2]  Ira Kemelmacher-Shlizerman,et al.  Being John Malkovich , 2010, ECCV.

[3]  Seth J. Teller,et al.  Video matching , 2004, Encyclopedia of Multimedia.

[4]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, ACM Trans. Graph..

[5]  Micah K. Johnson,et al.  Multi-scale image harmonization , 2010, ACM Trans. Graph..

[6]  Kun Zhou,et al.  Displaced dynamic expression regression for real-time facial tracking and animation , 2014, ACM Trans. Graph..

[7]  Patrick Pérez,et al.  Automatic Face Reenactment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  P. Mermelstein,et al.  Distance measures for speech recognition, psychological and instrumental , 1976 .

[9]  Wan-Chun Ma,et al.  The Digital Emily Project: Achieving a Photorealistic Digital Actor , 2010, IEEE Computer Graphics and Applications.

[10]  Fernando De la Torre,et al.  Generalized time warping for multi-modal alignment of human motion , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Wilmot Li,et al.  Tools for placing cuts and transitions in interview video , 2012, ACM Trans. Graph..

[12]  Yangang Wang,et al.  Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[13]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[14]  Markus H. Gross,et al.  VideoSnapping , 2014 .

[15]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[16]  Martin Klaudiny,et al.  High-Detail 3D Capture and Non-sequential Alignment of Facial Performance , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[17]  Fei Yang,et al.  Facial expression editing in video using a temporally-smooth factorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Taehyun Rhee,et al.  Real-time facial animation from live video tracking , 2011, SCA '11.

[19]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Thaddeus Beier,et al.  Feature-based image metamorphosis , 1992, SIGGRAPH.

[21]  Antonio Manuel López Peña,et al.  Joint Spatio-Temporal Alignment of Sequences , 2013, IEEE Transactions on Multimedia.

[22]  Fernando De la Torre,et al.  Canonical Time Warping for Alignment of Human Behavior , 2009, NIPS.

[23]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[24]  Tony Ezzat,et al.  Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[25]  Malcolm Slaney,et al.  Automatic audio morphing , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[26]  Christian Theobalt,et al.  Reconstructing detailed dynamic face geometry from monocular video , 2013, ACM Trans. Graph..

[27]  Pieter Peers,et al.  Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination , 2007 .

[28]  W. Heidrich,et al.  High resolution passive facial performance capture , 2010, ACM Trans. Graph..

[29]  Xin Tong,et al.  Automatic acquisition of high-fidelity facial performances using monocular videos , 2014, ACM Trans. Graph..

[30]  Patrick Pérez,et al.  VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track , 2015, Comput. Graph. Forum.

[31]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Feng Xu,et al.  Controllable high-fidelity facial performance transfer , 2014, ACM Trans. Graph..

[33]  Markus H. Gross,et al.  Gaze Correction for Home Video Conferencing , 2012 .

[34]  Pieter Peers,et al.  Facial performance synthesis using deformation-driven polynomial displacement maps , 2008, SIGGRAPH Asia '08.

[35]  Paul A. Beardsley,et al.  High-quality passive facial performance capture using anchor frames , 2011, SIGGRAPH 2011.

[36]  Yuting Ye,et al.  High fidelity facial animation capture and retargeting with contours , 2013, SCA '13.

[37]  Hans-Peter Seidel,et al.  Lightweight binocular facial performance capture under uncontrolled lighting , 2012, ACM Trans. Graph..

[38]  Yizhar Lavner,et al.  Time-Scale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[40]  Moshe Mahler,et al.  Dynamic units of visual speech , 2012, SCA '12.

[41]  Jing Liao,et al.  Automating Image Morphing Using Structural Similarity on a Halfway Domain , 2014, ACM Trans. Graph..

[42]  Wojciech Matusik,et al.  Video face replacement , 2011, ACM Trans. Graph..

[43]  Derek Bradley,et al.  High-quality passive facial performance capture using anchor frames , 2011, ACM Trans. Graph..

[44]  Jovan Popovic,et al.  Style translation for human motion , 2005, ACM Trans. Graph..

[45]  M. Hunt,et al.  Distance measures for speech recognition , 1989 .

[46]  Jing Liao,et al.  Semi‐Automated Video Morphing , 2014, Comput. Graph. Forum.