Attention-Driven Cropping for Very High Resolution Facial Landmark Detection

Facial landmark detection is a fundamental task for many consumer and high-end applications and is almost entirely solved by machine learning methods today. Existing datasets used to train such algorithms are primarily made up of only low resolution images, and current algorithms are limited to inputs of comparable quality and resolution as the training dataset. On the other hand, high resolution imagery is becoming increasingly more common as consumer cameras improve in quality every year. Therefore, there is need for algorithms that can leverage the rich information available in high resolution imagery. Naively attempting to reuse existing network architectures on high resolution imagery is prohibitive due to memory bottlenecks on GPUs. The only current solution is to downsample the images, sacrificing resolution and quality. Building on top of recent progress in attention-based networks, we present a novel, fully convolutional regional architecture that is specially designed for predicting landmarks on very high resolution facial images without downsampling. We demonstrate the flexibility of our architecture by training the proposed model with images of resolutions ranging from 256 x 256 to 4K. In addition to being the first method for facial landmark detection on high resolution images, our approach achieves superior performance over traditional (holistic) state-of-the-art architectures across ALL resolutions, leading to a general-purpose, extremely flexible, high quality landmark detector.

[1]  Feng Zhou,et al.  Deep Deformation Network for Object Landmark Localization , 2016, ECCV.

[2]  Qingshan Liu,et al.  Stacked Hourglass Network for Robust Facial Landmark Localisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Zuochang Ye,et al.  FAB: A Robust Facial Landmark Detection Framework for Motion-Blurred Videos , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[5]  Xiaoming Liu,et al.  Large-Pose Face Alignment via CNN-Based Dense 3D Model Fitting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jiwen Lu,et al.  Two-Stream Transformer Networks for Video-Based Face Alignment , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[8]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[10]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[13]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[14]  George Trigeorgis,et al.  Mnemonic Descent Method: A Recurrent Process Applied for End-to-End Face Alignment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, ACM Trans. Graph..

[17]  Ioannis A. Kakadiaris,et al.  Joint Head Pose Estimation and Face Alignment Framework Using Global and Local CNN Features , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[18]  Yuan Dong,et al.  Adaptive Cascade Deep Convolutional Neural Networks for face alignment , 2015, Comput. Stand. Interfaces.

[19]  Josef Kittler,et al.  Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Georgios Tzimiropoulos,et al.  Project-Out Cascaded Regression with an application to face alignment , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yuning Jiang,et al.  Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[22]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[24]  Xiaoming Liu,et al.  Pose-Invariant Face Alignment with a Single CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[26]  George Trigeorgis,et al.  The Menpo Facial Landmark Localisation Challenge: A Step Towards the Solution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Marios Savvides,et al.  Faster than Real-Time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  George Trigeorgis,et al.  Joint Multi-View Face Alignment in the Wild , 2017, IEEE Transactions on Image Processing.

[32]  Yi Yang,et al.  Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Stefanos Zafeiriou,et al.  The First Facial Landmark Tracking in-the-Wild Challenge: Benchmark and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[34]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Pavlo Molchanov,et al.  Hand Pose Estimation via Latent 2.5D Heatmap Regression , 2018, ECCV.

[37]  Heng Huang,et al.  Direct Shape Regression Networks for End-to-End Face Alignment , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Mingjie Zheng,et al.  Robust Facial Landmark Detection via Occlusion-Adaptive Deep Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Xiaoou Tang,et al.  Learning Deep Representation for Face Alignment with Auxiliary Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[42]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Robert B. Fisher,et al.  3D Visual passcode: Speech-driven 3D facial dynamics for behaviometrics , 2019, Signal Process..

[44]  Ashraf A. Kassim,et al.  Recurrent 3D-2D Dual Learning for Large-Pose Facial Landmark Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Marek Kowalski,et al.  Deep Alignment Network: A Convolutional Neural Network for Robust Face Alignment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[48]  Cheng Cheng,et al.  A Deep Regression Architecture with Two-Stage Re-initialization for High Performance Facial Landmark Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Yi Yang,et al.  Style Aggregated Network for Facial Landmark Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Yi Yang,et al.  Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Xiaoming Liu,et al.  Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos , 2019, AAAI.

[52]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.