Joint Face Detection and Facial Motion Retargeting for Multiple Faces

Facial motion retargeting is an important problem in both computer graphics and vision, which involves capturing the performance of a human face and transferring it to another 3D character. Learning 3D morphable model (3DMM) parameters from 2D face images using convolutional neural networks is common in 2D face alignment, 3D face reconstruction etc. However, existing methods either require an additional face detection step before retargeting or use a cascade of separate networks to perform detection followed by retargeting in a sequence. In this paper, we present a single end-to-end network to jointly predict the bounding box locations and 3DMM parameters for multiple faces. First, we design a novel multitask learning framework that learns a disentangled representation of 3DMM parameters for a single face. Then, we leverage the trained single face model to generate ground truth 3DMM parameters for multiple faces to train another network that performs joint face detection and motion retargeting for images with multiple faces. Experimental results show that our joint detection and retargeting network has high face detection accuracy and is robust to extreme expressions and poses while being faster than state-of-the-art methods.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Qingshan Liu,et al.  Facial Shape Tracking via Spatio-Temporal Cascade Shape Regression , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[3]  Tal Hassner,et al.  Regressing Robust and Discriminative 3D Morphable Models with a Very Deep Neural Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[5]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[6]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[7]  Kun Zhou,et al.  3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..

[8]  Bailin Deng,et al.  Deep Face Feature for Face Alignment , 2017 .

[9]  Jihun Yu,et al.  Unconstrained realtime facial performance capture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Xi Zhou,et al.  Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network , 2018, ECCV.

[11]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  B. Heisele Face Detection , 2001 .

[14]  Simon Lucey,et al.  Real-time avatar animation from a single image , 2011, Face and Gesture 2011.

[15]  Yihong Gong,et al.  Tracking Persons-of-Interest via Adaptive Discriminative Features , 2016, ECCV.

[16]  Shuo Yang,et al.  From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Ashraf A. Kassim,et al.  Facial Landmark Detection via Progressive Initialization , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[18]  Yangang Wang,et al.  Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[19]  Jianfei Cai,et al.  CNN-Based Real-Time Dense Face Reconstruction with Inverse-Rendered Photo-Realistic Face Images , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[22]  Jian Sun,et al.  Joint Cascade Face Detection and Alignment , 2014, ECCV.

[23]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[24]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Kun Zhou,et al.  Displaced dynamic expression regression for real-time facial tracking and animation , 2014, ACM Trans. Graph..

[26]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Matan Sela,et al.  Learning Detailed Face Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Hao Li,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[29]  Emre Akbas,et al.  MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network , 2018, ECCV.

[30]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[31]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[35]  Lin Cai,et al.  Self-supervised CNN for Unconstrained 3D Facial Performance Capture from an RGB-D Camera , 2018, ArXiv.

[36]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Xiaoming Liu,et al.  Large-Pose Face Alignment via CNN-Based Dense 3D Model Fitting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  George Trigeorgis,et al.  Joint Multi-View Face Alignment in the Wild , 2017, IEEE Transactions on Image Processing.

[40]  Ivan Laptev,et al.  MobileFace: 3D Face Reconstruction with Efficient CNN Regression , 2018, ECCV Workshops.

[41]  Stefanos Zafeiriou,et al.  The First Facial Landmark Tracking in-the-Wild Challenge: Benchmark and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[42]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[43]  Simon Lucey,et al.  Real-time avatar animation from a single image , 2011, Face and Gesture 2011.

[44]  Xiaoming Liu,et al.  Nonlinear 3D Face Morphable Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Matan Sela,et al.  3D Face Reconstruction by Learning from Synthetic Data , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[46]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[47]  Jaakko Lehtinen,et al.  Production-level facial performance capture using deep convolutional neural networks , 2016, Symposium on Computer Animation.

[48]  Xiaoming Liu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  John P. Lewis,et al.  Facial retargeting with automatic range of motion alignment , 2017, ACM Trans. Graph..

[51]  Erik Hjelmås,et al.  Face Detection: A Survey , 2001, Comput. Vis. Image Underst..

[52]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[53]  Christian Huitema,et al.  Real-time 3D face tracking based on active appearance model constrained by depth data , 2014, Image Vis. Comput..

[54]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[56]  Fernando De la Torre,et al.  Global supervised descent method , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Linda G. Shapiro,et al.  Learning to Generate 3D Stylized Character Expressions from Humans , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[58]  Bailin Deng,et al.  Deep Face Feature for Face Alignment and Reconstruction , 2017, ArXiv.

[59]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[60]  Hao Li,et al.  Learning Dense Facial Correspondences in Unconstrained Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[61]  Marios Savvides,et al.  Faster than Real-Time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).