Deep appearance models for face rendering

We introduce a deep appearance model for rendering the human face. Inspired by Active Appearance Models, we develop a data-driven rendering pipeline that learns a joint representation of facial geometry and appearance from a multiview capture setup. Vertex positions and view-specific textures are modeled using a deep variational autoencoder that captures complex nonlinear effects while producing a smooth and compact latent representation. View-specific texture enables the modeling of view-dependent effects such as specularity. In addition, it can also correct for imperfect geometry stemming from biased or low resolution estimates. This is a significant departure from the traditional graphics pipeline, which requires highly accurate geometry as well as all elements of the shading model to achieve realism through physically-inspired light transport. Acquiring such a high level of accuracy is difficult in practice, especially for complex and intricate parts of the face, such as eyelashes and the oral cavity. These are handled naturally by our approach, which does not rely on precise estimates of geometry. Instead, the shading model accommodates deficiencies in geometry though the flexibility afforded by the neural network employed. At inference time, we condition the decoding network on the viewpoint of the camera in order to generate the appropriate texture for rendering. The resulting system can be implemented simply using existing rendering engines through dynamic textures with flat lighting. This representation, together with a novel unsupervised technique for mapping images to facial states, results in a system that is naturally suited to real-time interactive settings such as Virtual Reality (VR).

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Shree K. Nayar,et al.  Reflectance and Texture of Real-World Surfaces Authors , 1997, CVPR 1997.

[3]  Jeffrey R. Spies,et al.  Something in the way we move: Motion dynamics, not perceived sex, influence head movements in conversation. , 2011, Journal of experimental psychology. Human perception and performance.

[4]  Justus Thies,et al.  FaceVR , 2018, ACM Trans. Graph..

[5]  Ken-ichi Anjyo,et al.  Practice and Theory of Blendshape Facial Models , 2014, Eurographics.

[6]  Yu Zhang,et al.  Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.

[7]  Derek Bradley,et al.  Recent Advances in Facial Appearance Capture , 2015, Comput. Graph. Forum.

[8]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[9]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[10]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[13]  Kun Zhou,et al.  Real-time facial animation with image-based dynamic avatars , 2016, ACM Trans. Graph..

[14]  Maja Pantic,et al.  Generic Active Appearance Models Revisited , 2012, ACCV.

[15]  SaragihJason,et al.  Deep appearance models for face rendering , 2018 .

[16]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[17]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[18]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[19]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[20]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Ken-ichi Anjyo,et al.  Direct Manipulation Blendshapes , 2010, IEEE Computer Graphics and Applications.

[23]  Hao Li,et al.  Avatar digitization from a single image for real-time rendering , 2017, ACM Trans. Graph..

[24]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Xuehan Xiong Supervised Descent Method , 2015 .

[26]  Björn Stenger,et al.  Expressive visual text-to-speech as an assistive technology for individuals with autism spectrum conditions , 2016, Comput. Vis. Image Underst..

[27]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[28]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[29]  Paul E. Debevec,et al.  Rapid Photorealistic Blendshape Modeling from RGB-D Sensors , 2016, CASA.

[30]  Sami Romdhani,et al.  Morphable Models of Faces , 2011, Handbook of Face Recognition.

[31]  P. Ekman The face of man : expressions of universal emotions in a New Guinea village , 1981 .

[32]  Gabriel Taubin,et al.  Curve and surface smoothing without shrinkage , 1995, Proceedings of IEEE International Conference on Computer Vision.

[33]  Timothy F. Cootes,et al.  Interpreting face images using active appearance models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[34]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[35]  LinLin Shen,et al.  Deep Feature Consistent Variational Autoencoder , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[36]  Jaakko Lehtinen,et al.  Production-level facial performance capture using deep convolutional neural networks , 2016, Symposium on Computer Animation.

[37]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Joseph J. Lim,et al.  High-fidelity facial and speech animation for VR HMDs , 2016, ACM Trans. Graph..

[40]  Mark Pauly,et al.  Dynamic 3D avatar creation from hand-held video input , 2015, ACM Trans. Graph..

[41]  Shree K. Nayar,et al.  Reflectance and texture of real-world surfaces , 1999, TOGS.

[42]  Richard Szeliski,et al.  The geometry-image representation tradeoff for rendering , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.