The Menpo Benchmark for Multi-pose 2D and 3D Facial Landmark Localisation and Tracking

In this article, we present the Menpo 2D and Menpo 3D benchmarks, two new datasets for multi-pose 2D and 3D facial landmark localisation and tracking. In contrast to the previous benchmarks such as 300W and 300VW, the proposed benchmarks contain facial images in both semi-frontal and profile pose. We introduce an elaborate semi-automatic methodology for providing high-quality annotations for both the Menpo 2D and Menpo 3D benchmarks. In Menpo 2D benchmark, different visible landmark configurations are designed for semi-frontal and profile faces, thus making the 2D face alignment full-pose. In Menpo 3D benchmark, a united landmark configuration is designed for both semi-frontal and profile faces based on the correspondence with a 3D face model, thus making face alignment not only full-pose but also corresponding to the real-world 3D space. Based on the considerable number of annotated images, we organised Menpo 2D Challenge and Menpo 3D Challenge for face alignment under large pose variations in conjunction with CVPR 2017 and ICCV 2017, respectively. The results of these challenges demonstrate that recent deep learning architectures, when trained with the abundant data, lead to excellent results. We also provide a very simple, yet effective solution, named Cascade Multi-view Hourglass Model, to 2D and 3D face alignment. In our method, we take advantage of all 2D and 3D facial landmark annotations in a joint way. We not only capitalise on the correspondences between the semi-frontal and profile 2D facial landmarks but also employ joint supervision from both 2D and 3D facial landmarks. Finally, we discuss future directions on the topic of face alignment.

[1]  Josef Kittler,et al.  Face Detection, Bounding Box Aggregation and Pose Estimation for Robust Facial Landmark Localisation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Shiguang Shan,et al.  Robust FEC-CNN: A High Accuracy Facial Landmark Detection System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[4]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[5]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[6]  Stefanos Zafeiriou,et al.  A Comprehensive Performance Evaluation of Deformable Face Tracking “In-the-Wild” , 2016, International Journal of Computer Vision.

[7]  Stefanos Zafeiriou,et al.  Large Scale 3D Morphable Models , 2017, International Journal of Computer Vision.

[8]  László A. Jeni,et al.  The First 3D Face Alignment in the Wild (3DFAW) Challenge , 2016, ECCV Workshops.

[9]  Lale Akarun,et al.  A Database of Non-Manual Signs in Turkish Sign Language , 2007, 2007 IEEE 15th Signal Processing and Communications Applications.

[10]  Stefanos Zafeiriou,et al.  A 3D Morphable Model Learnt from 10,000 Faces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Cheng Li,et al.  Unconstrained Face Alignment via Cascaded Compositional Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  George Trigeorgis,et al.  The Menpo Facial Landmark Localisation Challenge: A Step Towards the Solution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Amnon Shashua,et al.  Learning a Metric Embedding for Face Recognition using the Multibatch Method , 2016, NIPS.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Guoqing Li,et al.  Combining Local and Global Features for 3D Face Tracking , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[17]  George Trigeorgis,et al.  Mnemonic Descent Method: A Recurrent Process Applied for End-to-End Face Alignment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  George Trigeorgis,et al.  3D Face Morphable Models "In-the-Wild" , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Thomas F. Coleman,et al.  A Reflective Newton Method for Minimizing a Quadratic Function Subject to Bounds on Some of the Variables , 1992, SIAM J. Optim..

[20]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Wenyan Wu,et al.  Leveraging Intra and Inter-Dataset Variations for Robust Face Alignment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .

[23]  Marek Kowalski,et al.  Deep Alignment Network: A Convolutional Neural Network for Robust Face Alignment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Stefanos Zafeiriou,et al.  Incremental Face Alignment in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[26]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[27]  Jiankang Deng,et al.  Cascade Multiview Hourglass Model for Robust 3 D Face Alignment , 2018 .

[28]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[31]  Fernando De la Torre,et al.  Global supervised descent method , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[33]  Xiaoou Tang,et al.  Learning Deep Representation for Face Alignment with Auxiliary Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Joon Son Chung,et al.  Lip Reading in the Wild , 2016, ACCV.

[35]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[36]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[37]  Iasonas Kokkinos,et al.  DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Timothy F. Cootes,et al.  Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.

[39]  George Trigeorgis,et al.  The 3D Menpo Facial Landmark Tracking Challenge , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[40]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[41]  Patrick J. Flynn,et al.  Overview of the face recognition grand challenge , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[42]  Stefanos Zafeiriou,et al.  Statistical non-rigid ICP algorithm and its application to 3D face alignment , 2017, Image Vis. Comput..

[43]  Maja Pantic,et al.  Joint Facial Action Unit Detection and Feature Fusion: A Multi-conditional Learning Approach. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[44]  Qingshan Liu,et al.  Stacked Hourglass Network for Robust Facial Landmark Localisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45]  Jake K. Aggarwal,et al.  Structure from Motion of Rigid and Jointed Objects , 1981, Artif. Intell..

[46]  Qingshan Liu,et al.  Facial Shape Tracking via Spatio-Temporal Cascade Shape Regression , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[47]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[48]  Daniel E. Crispell,et al.  Pix2Face: Direct 3D Face Model Estimation , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[49]  Maja Pantic,et al.  Optimization Problems for Fast AAM Fitting in-the-Wild , 2013, 2013 IEEE International Conference on Computer Vision.

[50]  Iasonas Kokkinos,et al.  DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[52]  Yuning Jiang,et al.  Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[53]  Klaus J. Kirchberg,et al.  Robust Face Detection Using the Hausdorff Distance , 2001, AVBPA.

[54]  Louis-Philippe Morency,et al.  Convolutional Experts Constrained Local Model for Facial Landmark Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[55]  Adam Schmidt,et al.  The put face database , 2008 .

[56]  Nicu Sebe,et al.  The First 3D Face Alignment in the Wild (3DFAW) Challenge , 2016, ECCV Workshops.

[57]  Xi Chen,et al.  Delving Deep Into Coarse-to-Fine Framework for Facial Landmark Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[58]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Louis-Philippe Morency,et al.  Convolutional Experts Constrained Local Model for 3D Facial Landmark Detection , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[60]  Junjie Yan,et al.  Learn to Combine Multiple Hypotheses for Accurate Face Alignment , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[61]  Joon Son Chung,et al.  Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Fred Nicolls,et al.  Active shape models with SIFT descriptors and MARS , 2015, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[63]  Xi Zhou,et al.  Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network , 2018, ECCV.

[64]  Charless C. Fowlkes,et al.  Occlusion Coherence: Detecting and Localizing Occluded Faces , 2015, ArXiv.

[65]  Cheng Cheng,et al.  Unconstrained Face Alignment Without Face Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[66]  Stefanos Zafeiriou,et al.  A Unified Framework for Compositional Fitting of Active Appearance Models , 2016, International Journal of Computer Vision.

[67]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[68]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[69]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[70]  Ersin Yumer,et al.  Neural Face Editing with Intrinsic Image Disentangling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Xiaoming Liu,et al.  Dense Face Alignment , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[72]  Stefanos Zafeiriou,et al.  Cascade Multi-View Hourglass Model for Robust 3D Face Alignment , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[73]  Georgios Tzimiropoulos,et al.  Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[74]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[75]  Georgios Tzimiropoulos,et al.  Two-Stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge , 2016, ECCV Workshops.

[76]  Ashraf A. Kassim,et al.  3D-Assisted Coarse-to-Fine Extreme-Pose Facial Landmark Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[77]  David J. Kriegman,et al.  Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[78]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[79]  M. Pantic,et al.  Faces InThe-Wild Challenge : Database and Results , 2016 .

[80]  Maja Pantic,et al.  Gaussian Process Domain Experts for Model Adaptation in Facial Behavior Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[81]  Sina Honari,et al.  Improving Landmark Localization with Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[82]  Stefanos Zafeiriou,et al.  Feature-Based Lucas–Kanade and Active Appearance Models , 2015, IEEE Transactions on Image Processing.

[83]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Stephen Milborrow The MUCT Landmarked Face Database , 2010 .

[85]  George Trigeorgis,et al.  Joint Multi-View Face Alignment in the Wild , 2017, IEEE Transactions on Image Processing.

[86]  Qingshan Liu,et al.  M3 CSR: Multi-view, multi-scale and multi-component cascade shape regression , 2016, Image Vis. Comput..

[87]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[88]  Stefanos Zafeiriou,et al.  Offline Deformable Face Tracking in Arbitrary Videos , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[89]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[90]  Stefanos Zafeiriou,et al.  The First Facial Landmark Tracking in-the-Wild Challenge: Benchmark and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[91]  Haoqiang Fan,et al.  Approaching human level facial landmark localization by deep learning , 2016, Image Vis. Comput..

[92]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.