Aggregation via Separation: Boosting Facial Landmark Detector With Semi-Supervised Style Translation

Facial landmark detection, or face alignment, is a fundamental task that has been extensively studied. In this paper, we investigate a new perspective of facial landmark detection and demonstrate it leads to further notable improvement. Given that any face images can be factored into space of style that captures lighting, texture and image environment, and a style-invariant structure space, our key idea is to leverage disentangled style and shape space of each individual to augment existing structures via style translation. With these augmented synthetic samples, our semi-supervised model surprisingly outperforms the fully-supervised one by a large margin. Extensive experiments verify the effectiveness of our idea with state-of-the-art results on WFLW, 300W, COFW, and AFLW datasets. Our proposed structure is general and could be assembled into any face alignment frameworks. The code is made publicly available at https://github.com/thesouthfrog/stylealign.

[1]  Yi Yang,et al.  Style Aggregated Network for Facial Landmark Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[3]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .

[4]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[5]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[6]  Tim K Marks,et al.  Robust Face Alignment Using a Mixture of Invariant Experts , 2016, ECCV.

[7]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[9]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[10]  Xiaoming Liu,et al.  Pose-Invariant Face Alignment with a Single CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Cheng Li,et al.  Unconstrained Face Alignment via Cascaded Compositional Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  George Trigeorgis,et al.  The Menpo Facial Landmark Localisation Challenge: A Step Towards the Solution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Rasmus Larsen,et al.  An Active Illumination and Appearance (AIA) Model for Face Alignment , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Yici Cai,et al.  Look at Boundary: A Boundary-Aware Face Alignment Algorithm , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Timothy F. Cootes,et al.  Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.

[17]  Xi Zhou,et al.  Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network , 2018, ECCV.

[18]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[19]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Georgios Tzimiropoulos,et al.  Synergy between Face Alignment and Tracking via Discriminative Global Consensus Optimization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Rama Chellappa,et al.  Disentangling 3D Pose in a Dendritic CNN for Unconstrained 2D Face Alignment , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Roland Göcke,et al.  A Nonlinear Discriminative Approach to AAM Fitting , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Alexei A. Efros,et al.  Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Gerhard Rigoll,et al.  Robust Facial Landmark Detection via a Fully-Convolutional Local-Global Context Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Wenyan Wu,et al.  Leveraging Intra and Inter-Dataset Variations for Robust Face Alignment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Sina Honari,et al.  Improving Landmark Localization with Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Yorgos Tzimiropoulos,et al.  Bulat , Adrian and Tzimiropoulos , Georgios ( 2016 ) Convolutional aggregation of local evidence for large pose face alignment , 2017 .

[28]  Gregory Shakhnarovich,et al.  Colorization as a Proxy Task for Visual Understanding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Charless C. Fowlkes,et al.  Occlusion Coherence: Localizing Occluded Faces with a Hierarchical Deformable Part Model , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Xiaoming Liu,et al.  Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ashraf A. Kassim,et al.  Recurrent 3D-2D Dual Learning for Large-Pose Facial Landmark Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Hanjiang Lai,et al.  Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks , 2016, ECCV.

[33]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[35]  Nitish Srivastava Unsupervised Learning of Visual Representations using Videos , 2015 .

[36]  Fernando De la Torre,et al.  Global supervised descent method , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Xiaoou Tang,et al.  Learning Deep Representation for Face Alignment with Auxiliary Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Frédo Durand,et al.  Synthesizing Images of Humans in Unseen Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  David J. Kriegman,et al.  Localizing Parts of Faces Using a Consensus of Exemplars , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[44]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[45]  Cheng Cheng,et al.  A Deep Regression Architecture with Two-Stage Re-initialization for High Performance Facial Landmark Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Francesc Moreno-Noguer,et al.  Unsupervised Person Image Synthesis in Arbitrary Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[50]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  George Trigeorgis,et al.  Mnemonic Descent Method: A Recurrent Process Applied for End-to-End Face Alignment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  B. Heisele Face Detection , 2001 .

[53]  Josef Kittler,et al.  Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Björn Ommer,et al.  A Variational U-Net for Conditional Appearance and Shape Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Sebastian Thrun,et al.  Unsupervised learning of invariant features using video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[57]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[58]  Heng Huang,et al.  Direct Shape Regression Networks for End-to-End Face Alignment , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[60]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[61]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[62]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[63]  William J. Christmas,et al.  Cascaded Collaborative Regression for Robust Facial Landmark Detection Trained Using a Mixture of Synthetic and Real Images With Dynamic Weighting , 2015, IEEE Transactions on Image Processing.

[64]  Gregory Shakhnarovich,et al.  Learning Representations for Automatic Colorization , 2016, ECCV.

[65]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[66]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[67]  José Miguel Buenaposada,et al.  A Deeply-Initialized Coarse-to-fine Ensemble of Regression Trees for Face Alignment , 2018, ECCV.

[68]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[70]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[71]  Jian Sun,et al.  Joint Cascade Face Detection and Alignment , 2014, ECCV.

[72]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[73]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[74]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[75]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[77]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[78]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  George Trigeorgis,et al.  Joint Multi-View Face Alignment in the Wild , 2017, IEEE Transactions on Image Processing.

[80]  Stefanos Zafeiriou,et al.  The First Facial Landmark Tracking in-the-Wild Challenge: Benchmark and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[81]  Sina Honari,et al.  Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[83]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[84]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[85]  Luc Van Gool,et al.  Pose Guided Person Image Generation , 2017, NIPS.

[86]  Shenghuo Zhu,et al.  Deep Learning of Invariant Features via Simulated Fixations in Video , 2012, NIPS.