Heterogenous output regression network for direct face alignment

Abstract Face alignment has gained great popularity in computer vision due to its wide-spread applications. In this paper, we propose a novel learning architecture, i.e., heterogenous output regression network (HORNet), for face alignment, which directly predicts facial landmarks from images. HORNet is based on kernel approximations and establishes a new compact multi-layer architecture. A nonlinear layer with cosine activations disentangles nonlinear relationships between representations of images and shapes of facial landmarks. A linear layer with identity activations explicitly encodes landmark correlations by low-rank learning via matrix elastic nets. HORNet is highly flexible and can work either with pre-built feature representations or with convolutional architectures for end-to-end learning. HORNet leverages the strengths of both kernel methods in modeling nonlinearities and of neural networks in structural prediction. This combination renders it effective and efficient for direct face alignment. Extensive experiments on five in-the-wild datasets show that HORNet delivers high performance and consistently exceeds state-of-the-art methods.

[1]  Xiaoou Tang,et al.  Hierarchical facial landmark localization via cascaded random binary patterns , 2015, Pattern Recognit..

[2]  Jian Yang,et al.  Robust, discriminative and comprehensive dictionary learning for face recognition , 2018, Pattern Recognit..

[3]  Xianglong Liu,et al.  Learning binary code for fast nearest subspace search , 2020, Pattern Recognit..

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[6]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[8]  Chao Li,et al.  Active multi-kernel domain adaptation for hyperspectral image classification , 2017, Pattern Recognit..

[9]  Wenbo Guo,et al.  Explaining Deep Learning Models - A Bayesian Non-parametric Approach , 2018, NeurIPS.

[10]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[11]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[12]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[14]  Edwin R. Hancock,et al.  A Coupled Statistical Model for Face Shape Recovery From Brightness Images , 2007, IEEE Transactions on Image Processing.

[15]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  W. Rudin,et al.  Fourier Analysis on Groups. , 1965 .

[17]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Hamed Kiani Galoogahi,et al.  Correlation filter cascade for facial landmark localization , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Cristian Sminchisescu,et al.  Fourier Kernel Learning , 2012, ECCV.

[20]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Qiang Ji,et al.  A joint cascaded framework for simultaneous eye detection and eye state estimation , 2017, Pattern Recognit..

[22]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xudong Jiang,et al.  A complete and fully automated face verification system on mobile devices , 2013, Pattern Recognit..

[24]  Stefanos Zafeiriou,et al.  The First Facial Landmark Tracking in-the-Wild Challenge: Benchmark and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[25]  Xiaofei He,et al.  Multi-Target Regression via Robust Low-Rank Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Stefanos Zafeiriou,et al.  300 Faces In-The-Wild Challenge: database and results , 2016, Image Vis. Comput..

[27]  Suresh Venkatasubramanian,et al.  Continuous Kernel Learning , 2016, ECML/PKDD.

[28]  Yorgos Tzimiropoulos,et al.  Bulat , Adrian and Tzimiropoulos , Georgios ( 2016 ) Convolutional aggregation of local evidence for large pose face alignment , 2017 .

[29]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[30]  Misha Denil,et al.  ACDC: A Structured Efficient Linear Layer , 2015, ICLR.

[31]  Xianglong Liu,et al.  Spatio-temporal deformable 3D ConvNets with attention for action recognition , 2020, Pattern Recognit..

[32]  Walter Karlen,et al.  CXPlain: Causal Explanations for Model Interpretation under Uncertainty , 2019, NeurIPS.

[33]  Ling Shao,et al.  Learning Match Kernels on Grassmann Manifolds for Action Recognition , 2019, IEEE Transactions on Image Processing.

[34]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[35]  Ioannis A. Kakadiaris,et al.  Joint Head Pose Estimation and Face Alignment Framework Using Global and Local CNN Features , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[36]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  John C. Duchi,et al.  Learning Kernels with Random Features , 2016, NIPS.

[39]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[40]  Xiaoming Liu,et al.  Pose-Invariant 3D Face Alignment , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Tal Hassner,et al.  Facial Landmark Detection with Tweaked Convolutional Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Martin J. Wainwright,et al.  Convexified Convolutional Neural Networks , 2016, ICML.

[43]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[44]  Stefanos Zafeiriou,et al.  A Comprehensive Performance Evaluation of Deformable Face Tracking “In-the-Wild” , 2016, International Journal of Computer Vision.

[45]  Michel F. Valstar,et al.  L2, 1-based regression and prediction accumulation across views for robust facial landmark detection , 2016, Image Vis. Comput..

[46]  George Trigeorgis,et al.  Mnemonic Descent Method: A Recurrent Process Applied for End-to-End Face Alignment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Cheng Li,et al.  Unconstrained Face Alignment via Cascaded Compositional Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Yi Yang,et al.  Style Aggregated Network for Facial Landmark Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  David J. Kriegman,et al.  Localizing Parts of Faces Using a Consensus of Exemplars , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Arnaud Doucet,et al.  On the Impact of the Activation Function on Deep Neural Networks Training , 2019, ICML.

[51]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Stefanos Zafeiriou,et al.  Robust Discriminative Response Map Fitting with Constrained Local Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[54]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[55]  Xiaoming Liu,et al.  Large-Pose Face Alignment via CNN-Based Dense 3D Model Fitting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Stefanos Zafeiriou,et al.  Incremental Face Alignment in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[58]  Heng Huang,et al.  Direct Shape Regression Networks for End-to-End Face Alignment , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Rama Chellappa,et al.  Growing Regression Tree Forests by Classification for Continuous Object Pose Estimation , 2017, International Journal of Computer Vision.

[60]  Na Chen,et al.  Error Analysis for Matrix Elastic-Net Regularization Algorithms , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[61]  Rama Chellappa,et al.  Disentangling 3D Pose in a Dendritic CNN for Unconstrained 2D Face Alignment , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  Jiwen Lu,et al.  Two-Stream Transformer Networks for Video-Based Face Alignment , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[64]  Chao Li,et al.  Active Transfer Learning Network: A Unified Deep Joint Spectral–Spatial Feature Learning Model for Hyperspectral Image Classification , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[65]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[66]  Fernando De la Torre,et al.  Global supervised descent method , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[68]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[69]  Xiantong Zhen,et al.  Multitarget Sparse Latent Regression , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[70]  Lei Yue,et al.  Multi-Scale Aggregation Network for Direct Face Alignment , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[71]  Xiaoou Tang,et al.  Learning Deep Representation for Face Alignment with Auxiliary Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Junzhou Huang,et al.  Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model , 2013, 2013 IEEE International Conference on Computer Vision.

[73]  Georgios Tzimiropoulos,et al.  Project-Out Cascaded Regression with an application to face alignment , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Lei Yue,et al.  Attentional Alignment Networks , 2018, BMVC.

[75]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[76]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[77]  Tim K Marks,et al.  Robust Face Alignment Using a Mixture of Invariant Experts , 2016, ECCV.

[78]  Xiaoming Liu,et al.  Discriminative Face Alignment , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[80]  Dimitris N. Metaxas,et al.  A Coupled Encoder-Decoder Network for Joint Face Detection and Landmark Localization , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[81]  Qingshan Liu,et al.  Facial Shape Tracking via Spatio-Temporal Cascade Shape Regression , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[82]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[83]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[84]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.