Headon

We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel realtime reenactment algorithm employs this proxy to photo-realistically map the captured motion from the source actor to the target actor. On top of the coarse geometric proxy, we propose a video-based rendering technique that composites the modified target portrait video via view- and pose-dependent texturing, and creates photo-realistic imagery of the target actor under novel torso and head poses, facial expressions, and gaze directions. To this end, we propose a robust tracking of the face and torso of the source actor. We extensively evaluate our approach and show significant improvements in enabling much greater flexibility in creating realistic reenacted output videos.

[1]  Ira Kemelmacher-Shlizerman,et al.  Total Moving Face Reconstruction , 2014, ECCV.

[2]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[3]  Ken-ichi Anjyo,et al.  Practice and Theory of Blendshape Facial Models , 2014, Eurographics.

[4]  Kun Zhou,et al.  Displaced dynamic expression regression for real-time facial tracking and animation , 2014, ACM Trans. Graph..

[5]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[6]  Patrick Pérez,et al.  Automatic Face Reenactment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Hao Li,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[8]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[9]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[10]  Victor S. Lempitsky,et al.  DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation , 2016, ECCV.

[11]  Xin Tong,et al.  Accurate and Robust 3D Facial Capture Using a Single RGBD Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Ronald Fedkiw,et al.  Automatic determination of facial muscle activations from sparse motion capture marker data , 2005, ACM Trans. Graph..

[13]  Justus Thies,et al.  FaceVR , 2018, ACM Trans. Graph..

[14]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[15]  Patrick Pérez,et al.  State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications , 2018, Comput. Graph. Forum.

[16]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[17]  Derek Bradley,et al.  Recent Advances in Facial Appearance Capture , 2015, Comput. Graph. Forum.

[18]  Ira Kemelmacher-Shlizerman,et al.  Transfiguring portraits , 2016, ACM Trans. Graph..

[19]  Stefanos Zafeiriou,et al.  Large Scale 3D Morphable Models , 2017, International Journal of Computer Vision.

[20]  Hao Li,et al.  Avatar digitization from a single image for real-time rendering , 2017, ACM Trans. Graph..

[21]  Chongyang Ma,et al.  Facial performance sensing head-mounted display , 2015, ACM Trans. Graph..

[22]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[23]  Leonard McMillan,et al.  Dynamically reparameterized light fields , 2000, SIGGRAPH.

[24]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[25]  Tomaso A. Poggio,et al.  Reanimating Faces in Images and Video , 2003, Comput. Graph. Forum.

[26]  Christian Theobalt,et al.  Reconstructing detailed dynamic face geometry from monocular video , 2013, ACM Trans. Graph..

[27]  Shu Liang,et al.  3D Face Hallucination from a Single Depth Frame , 2014, 2014 2nd International Conference on 3D Vision.

[28]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, SIGGRAPH 2005.

[29]  Daniel Cohen-Or,et al.  Bringing portraits to life , 2017, ACM Trans. Graph..

[30]  Kun Zhou,et al.  3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..

[31]  Erika Chuang,et al.  Performance Driven Facial Animation using Blendshape Interpolation , 2002 .

[32]  Joseph J. Lim,et al.  High-fidelity facial and speech animation for VR HMDs , 2016, ACM Trans. Graph..

[33]  Ira Kemelmacher-Shlizerman,et al.  Synthesizing Obama , 2017, ACM Trans. Graph..

[34]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[36]  Luc Van Gool,et al.  Face/Off: live facial puppetry , 2009, SCA '09.

[37]  Andrew Jones,et al.  Driving High-Resolution Facial Scans with Video Performance Capture , 2014, ACM Trans. Graph..

[38]  Ira Kemelmacher-Shlizerman,et al.  What Makes Tom Hanks Look Like Tom Hanks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Simon Lucey,et al.  Deformable model fitting with a mixture of local experts , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[40]  Andrew W. Fitzgibbon,et al.  Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[41]  Wojciech Matusik,et al.  Video face replacement , 2011, ACM Trans. Graph..

[42]  Luc Van Gool,et al.  Pose Guided Person Image Generation , 2017, NIPS.

[43]  Xin Tong,et al.  Automatic acquisition of high-fidelity facial performances using monocular videos , 2014, ACM Trans. Graph..

[44]  Mark Pauly,et al.  Dynamic 3D avatar creation from hand-held video input , 2015, ACM Trans. Graph..

[45]  Paul Debevec,et al.  The Digital Emily project: photoreal facial modeling and animation , 2009, SIGGRAPH '09.

[46]  Kok-Lim Low Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration , 2004 .

[47]  Gérard G. Medioni,et al.  Object modelling by registration of multiple range images , 1992, Image Vis. Comput..

[48]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[49]  Christian Theobalt,et al.  Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[50]  Justus Thies,et al.  Demo of FaceVR: real-time facial reenactment and eye gaze control in virtual reality , 2016, SIGGRAPH Emerging Technologies.

[51]  David Salesin,et al.  Surface light fields for 3D photography , 2000, SIGGRAPH.

[52]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Yangang Wang,et al.  Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[54]  Jing Xiao,et al.  Vision-based control of 3D facial animation , 2003, SCA '03.

[55]  Thabo Beeler,et al.  Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..

[56]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[57]  Jihun Yu,et al.  Unconstrained realtime facial performance capture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Michael Goesele,et al.  Image-based rendering in the gradient domain , 2013, ACM Trans. Graph..

[59]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[60]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[61]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[62]  Takeo Kanade,et al.  Real-time combined 2D+3D active appearance models , 2004, CVPR 2004.

[63]  Shigeo Morishima,et al.  Data-Driven Speech Animation Synthesis Focusing on Realistic Inside of the Mouth , 2014, J. Inf. Process..

[64]  Stefanos Zafeiriou,et al.  A 3D Morphable Model Learnt from 10,000 Faces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Patrick Pérez,et al.  VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track , 2015, Comput. Graph. Forum.

[66]  Mark Pauly,et al.  Phace: physics-based face modeling and animation , 2017, ACM Trans. Graph..

[67]  Fernando De la Torre,et al.  Interactive region-based linear 3D face models , 2011, SIGGRAPH 2011.

[68]  Harry Shum,et al.  Image-based rendering , 2006, Found. Trends Comput. Graph. Vis..

[69]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[70]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[71]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[72]  Kun Zhou,et al.  Real-time facial animation with image-based dynamic avatars , 2016, ACM Trans. Graph..

[73]  Justus Thies,et al.  Real-time expression transfer for facial reenactment , 2015, ACM Trans. Graph..

[74]  Hans-Peter Seidel,et al.  Lightweight binocular facial performance capture under uncontrolled lighting , 2012, ACM Trans. Graph..