Corrective 3D reconstruction of lips from monocular video

In facial animation, the accurate shape and motion of the lips of virtual humans is of paramount importance, since subtle nuances in mouth expression strongly influence the interpretation of speech and the conveyed emotion. Unfortunately, passive photometric reconstruction of expressive lip motions, such as a kiss or rolling lips, is fundamentally hard even with multi-view methods in controlled studios. To alleviate this problem, we present a novel approach for fully automatic reconstruction of detailed and expressive lip shapes along with the dense geometry of the entire face, from just monocular RGB video. To this end, we learn the difference between inaccurate lip shapes found by a state-of-the-art monocular facial performance capture approach, and the true 3D lip shapes reconstructed using a high-quality multi-view system in combination with applied lip tattoos that are easy to track. A robust gradient domain regressor is trained to infer accurate lip shapes from coarse monocular reconstructions, with the additional help of automatically extracted inner and outer 2D lip contours. We quantitatively and qualitatively show that our monocular approach reconstructs higher quality lip shapes, even for complex shapes like a kiss or lip rolling, than previous monocular approaches. Furthermore, we compare the performance of person-specific and multi-person generic regression strategies and show that our approach generalizes to new individuals and general scenes, enabling high-fidelity reconstruction even from commodity video footage.

[1]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[2]  Chongyang Ma,et al.  Single-view hair modeling using a hairstyle database , 2015, ACM Trans. Graph..

[3]  Justus Thies,et al.  Real-time expression transfer for facial reenactment , 2015, ACM Trans. Graph..

[4]  Qiang Huo,et al.  Video-audio driven real-time facial animation , 2015, ACM Trans. Graph..

[5]  Hans-Peter Seidel,et al.  Lightweight binocular facial performance capture under uncontrolled lighting , 2012, ACM Trans. Graph..

[6]  Mark Pauly,et al.  Dynamic 3D avatar creation from hand-held video input , 2015, ACM Trans. Graph..

[7]  Xin Tong,et al.  Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition , 2011, ACM Trans. Graph..

[8]  Derek Bradley,et al.  Detailed spatio-temporal reconstruction of eyelids , 2015, ACM Trans. Graph..

[9]  Christian Theobalt,et al.  Reconstructing detailed dynamic face geometry from monocular video , 2013, ACM Trans. Graph..

[10]  Eun-Jung Holden,et al.  Lip Tracking using Pattern Matching Snakes , 2002 .

[11]  Yangang Wang,et al.  Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[12]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[13]  Luc Van Gool,et al.  Face/Off: live facial puppetry , 2009, SCA '09.

[14]  Jeffrey F. Cohn,et al.  Robust Lip Tracking by Combining Shape, Color and Motion , 2007 .

[15]  Paul Graham,et al.  Measurement‐Based Synthesis of Facial Microgeometry , 2012, SIGGRAPH '12.

[16]  M. Gross,et al.  Analysis of human faces using a measurement-based skin reflectance model , 2006, ACM Trans. Graph..

[17]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, ACM Trans. Graph..

[18]  Michael S. Beauchamp,et al.  A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion , 2012, NeuroImage.

[19]  Shu Hung Leung,et al.  Automatic lip contour extraction from color images , 2004, Pattern Recognit..

[20]  Jovan Popovic,et al.  Deformation transfer for triangle meshes , 2004, ACM Trans. Graph..

[21]  Patrick Pérez,et al.  VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track , 2015, Comput. Graph. Forum.

[22]  Simon Lucey,et al.  Face alignment through subspace constrained mean-shifts , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Henrique S. Malvar,et al.  Making Faces , 2019, Topoi.

[24]  Xin Tong,et al.  Accurate and Robust 3D Facial Capture Using a Single RGBD Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Alice Caplier,et al.  Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Paul A. Beardsley,et al.  Coupled 3D reconstruction of sparse facial hair and skin , 2012, ACM Trans. Graph..

[27]  Shigeo Morishima,et al.  Automatic Photorealistic 3D Inner Mouth Restoration from Frontal Images , 2014, ISVC.

[28]  Diego Gutierrez,et al.  Capturing and stylizing hair for 3D fabrication , 2014, ACM Trans. Graph..

[29]  Wan-Chun Ma,et al.  The Digital Emily Project: Achieving a Photorealistic Digital Actor , 2010, IEEE Computer Graphics and Applications.

[30]  Martin Klaudiny,et al.  High-Detail 3D Capture and Non-sequential Alignment of Facial Performance , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[31]  Mark Pauly,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[32]  John P. Lewis,et al.  Universal capture: image-based facial animation for "The Matrix Reloaded" , 2003, SIGGRAPH '03.

[33]  Marc Alexa,et al.  Linear combination of transformations , 2002, ACM Trans. Graph..

[34]  Xin Tong,et al.  Automatic acquisition of high-fidelity facial performances using monocular videos , 2014, ACM Trans. Graph..

[35]  Andrew Blake,et al.  Accurate, real-time, unadorned lip tracking , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[36]  BradleyDerek,et al.  Corrective 3D reconstruction of lips from monocular video , 2016 .

[37]  Yuting Ye,et al.  High fidelity facial animation capture and retargeting with contours , 2013, SCA '13.

[38]  Jernej Barbic,et al.  Skin microstructure deformation with displacement map convolution , 2015, ACM Trans. Graph..

[39]  Maurice Milgram,et al.  Semi Adaptive Appearance Models for lip tracking , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[40]  Derek Bradley,et al.  High-quality passive facial performance capture using anchor frames , 2011, ACM Trans. Graph..

[41]  Ronald Fedkiw,et al.  Automatic determination of facial muscle activations from sparse motion capture marker data , 2005, SIGGRAPH '05.

[42]  Andrew Gardner,et al.  Performance relighting and reflectance transformation with time-multiplexed illumination , 2005, ACM Trans. Graph..

[43]  M. Otaduy,et al.  Multi-scale capture of facial geometry and motion , 2007, ACM Trans. Graph..

[44]  Andrew Jones,et al.  Digital Ira: creating a real-time photoreal digital actor , 2013, SIGGRAPH '13.

[45]  Andrew Jones,et al.  Driving High-Resolution Facial Scans with Video Performance Capture , 2014, ACM Trans. Graph..

[46]  N. Higham Computing the polar decomposition with applications , 1986 .

[47]  W. Heidrich,et al.  High resolution passive facial performance capture , 2010, ACM Trans. Graph..

[48]  Ira Kemelmacher-Shlizerman,et al.  What Makes Tom Hanks Look Like Tom Hanks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[50]  Wojciech Matusik,et al.  Video face replacement , 2011, ACM Trans. Graph..

[51]  Ken-ichi Anjyo,et al.  Direct Manipulation Blendshapes , 2010, IEEE Computer Graphics and Applications.

[52]  Paul E. Debevec,et al.  Multiview face capture using polarized spherical gradient illumination , 2011, ACM Trans. Graph..

[53]  Ira Kemelmacher-Shlizerman,et al.  Being John Malkovich , 2010, ECCV.

[54]  Lance Williams,et al.  Performance-driven facial animation , 1990, SIGGRAPH.

[55]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, ACM Trans. Graph..

[56]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[57]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[58]  Jihun Yu,et al.  Unconstrained realtime facial performance capture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Szymon Rusinkiewicz,et al.  Multi-view hair capture using orientation fields , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Björn Stenger,et al.  Lip Tracking for 3D Face Registration , 2013, MVA.

[61]  Zhuowen Tu,et al.  Supervised Learning of Edges and Object Boundaries , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[62]  Christian Theobalt,et al.  Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[63]  Ira Kemelmacher-Shlizerman,et al.  Total Moving Face Reconstruction , 2014, ECCV.

[64]  Kun Zhou,et al.  Displaced dynamic expression regression for real-time facial tracking and animation , 2014, ACM Trans. Graph..

[65]  Derek Bradley,et al.  High-quality capture of eyes , 2014, ACM Trans. Graph..

[66]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[67]  Ahmed M. Elgammal,et al.  High Resolution Acquisition, Learning and Transfer of Dynamic 3‐D Facial Expressions , 2004, Comput. Graph. Forum.

[68]  Thabo Beeler,et al.  Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..