Deep Energies for Estimating Three-Dimensional Facial Pose and Expression

While much progress has been made in capturing high-quality facial performances using motion capture markers and shape-from-shading, high-end systems typically also rely on rotoscope curves hand-drawn on the image. These curves are subjective and difficult to draw consistently; moreover, ad-hoc procedural methods are required for generating matching rotoscope curves on synthetic renders embedded in the optimization used to determine three-dimensional facial pose and expression. We propose an alternative approach whereby these curves and other keypoints are detected automatically on both the image and the synthetic renders using trained neural networks, eliminating artist subjectivity and the ad-hoc procedures meant to mimic it. More generally, we propose using machine learning networks to implicitly define deep energies which when minimized using classical optimization techniques lead to three-dimensional facial pose and expression estimation.

[1]  Ming-Hsuan Yang,et al.  Multi-objective convolutional learning for face labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[4]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[5]  Takeo Kanade,et al.  Dense 3D face alignment from 2D video for real-time use , 2017, Image Vis. Comput..

[6]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[8]  Carlos D. Castillo,et al.  SfSNet: Learning Shape, Reflectance and Illuminance of Faces 'in the Wild' , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Daniel Cohen-Or,et al.  Deep correlations for texture synthesis , 2017, TOGS.

[10]  Manolis I. A. Lourakis,et al.  Is Levenberg-Marquardt the most efficient optimization algorithm for implementing bundle adjustment? , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Georgios Tzimiropoulos,et al.  Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Yichen Wei,et al.  Integral Human Pose Regression , 2017, ECCV.

[13]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[14]  Jaakko Lehtinen,et al.  Differentiable Monte Carlo ray tracing through edge sampling , 2018, ACM Trans. Graph..

[15]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[16]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[17]  Didier Stricker,et al.  Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yi Yang,et al.  Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[21]  Jiankang Deng,et al.  Cascade Multiview Hourglass Model for Robust 3 D Face Alignment , 2018 .

[22]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Yici Cai,et al.  Look at Boundary: A Boundary-Aware Face Alignment Algorithm , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[25]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[26]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[28]  Yuting Ye,et al.  High fidelity facial animation capture and retargeting with contours , 2013, SCA '13.

[29]  Ming-Hsuan Yang,et al.  Generative Face Completion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Hao Li,et al.  Real-Time Facial Segmentation and Performance Capture from RGB Input , 2016, ECCV.

[31]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Zhigang Deng,et al.  Animating blendshape faces by cross-mapping motion capture data , 2006, I3D '06.

[34]  Terrence J. Sejnowski,et al.  SEXNET: A Neural Network Identifies Sex From Human Faces , 1990, NIPS.

[35]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[36]  Pieter Peers,et al.  Facial performance synthesis using deformation-driven polynomial displacement maps , 2008, SIGGRAPH Asia '08.

[37]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[38]  Louis-Philippe Morency,et al.  Convolutional Experts Constrained Local Model for 3D Facial Landmark Detection , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[39]  Tal Hassner,et al.  On Face Segmentation, Face Swapping, and Face Perception , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[40]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[43]  Oswald Aldrian,et al.  Inverse Rendering of Faces with a 3D Morphable Model , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  William J. Christmas,et al.  Dynamic Attention-Controlled Cascaded Shape Regression Exploiting Training Data Augmentation and Fuzzy-Set Sample Weighting , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Shuicheng Yan,et al.  Towards Robust and Accurate Multi-View and Partially-Occluded Face Alignment , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Xiaoming Liu,et al.  Pose-Invariant Face Alignment via CNN-Based Dense 3D Model Fitting , 2017, International Journal of Computer Vision.

[50]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[51]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[52]  Xin Jin,et al.  Face alignment in-the-wild: A Survey , 2016, Comput. Vis. Image Underst..

[53]  Pat Hanrahan,et al.  An efficient representation for irradiance environment maps , 2001, SIGGRAPH.

[54]  Ken-ichi Anjyo,et al.  Practice and Theory of Blendshape Facial Models , 2014, Eurographics.

[55]  Derek Bradley,et al.  An anatomically-constrained local deformation model for monocular face capture , 2016, ACM Trans. Graph..

[56]  Thomas Brox,et al.  Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[57]  Ronald Fedkiw,et al.  Automatic determination of facial muscle activations from sparse motion capture marker data , 2005, SIGGRAPH '05.

[58]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[60]  Lucas Theis,et al.  Fast Face-Swap Using Convolutional Neural Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[61]  Jian Sun,et al.  Joint Cascade Face Detection and Alignment , 2014, ECCV.

[62]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Cheng Li,et al.  Unconstrained Face Alignment via Cascaded Compositional Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Takeo Kanade,et al.  Rotation Invariant Neural Network-Based Face Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[65]  Jaakko Lehtinen,et al.  Production-level facial performance capture using deep convolutional neural networks , 2016, Symposium on Computer Animation.

[66]  Derek Bradley,et al.  User‐Guided Lip Correction for Facial Performance Capture , 2018, Comput. Graph. Forum.

[67]  Paul E. Debevec,et al.  Acquiring the reflectance field of a human face , 2000, SIGGRAPH.

[68]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..