Brute-Force Facial Landmark Analysis With A 140, 000-Way Classifier

We propose a simple approach to visual alignment, focusing on the illustrative task of facial landmark estimation. While most prior work treats this as a regression problem, we instead formulate it as a discrete $K$-way classification task, where a classifier is trained to return one of $K$ discrete alignments. One crucial benefit of a classifier is the ability to report back a (softmax) distribution over putative alignments. We demonstrate that this distribution is a rich representation that can be marginalized (to generate uncertainty estimates over groups of landmarks) and conditioned on (to incorporate top-down context, provided by temporal constraints in a video stream or an interactive human user). Such capabilities are difficult to integrate into classic regression-based approaches. We study performance as a function of the number of classes $K$, including the extreme "exemplar class" setting where $K$ is equal to the number of training examples (140K in our setting). Perhaps surprisingly, we show that classifiers can still be learned in this setting. When compared to prior work in classification, our $K$ is unprecedentedly large, including many "fine-grained" classes that are very similar. We address these issues by using a multi-label loss function that allows for training examples to be non-uniformly shared across discrete classes. We perform a comprehensive experimental analysis of our method on standard benchmarks, demonstrating state-of-the-art results for facial alignment in videos.

[1]  Timothy F. Cootes,et al.  Active Shape Models - 'smart snakes' , 1992, BMVC.

[2]  Timothy F. Cootes,et al.  Active Shape Model Search using Local Grey-Level Models: A Quantitative Evaluation , 1993, BMVC.

[3]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[4]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[5]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[6]  Peter Robinson,et al.  3D Constrained Local Model for rigid and non-rigid facial tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[8]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[9]  Ioannis Patras,et al.  Sieving Regression Forest Votes for Facial Feature Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Junzhou Huang,et al.  Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Kun Zhou,et al.  3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..

[14]  Enver Sangineto,et al.  Pose and Expression Independent Facial Landmark Localization Using Dense-SURF and the Hausdorff Distance , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Dragomir Anguelov,et al.  Capturing Long-Tail Distributions of Object Subcategories , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[20]  Stefanos Zafeiriou,et al.  The First Facial Landmark Tracking in-the-Wild Challenge: Benchmark and Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[21]  Qiang Ji,et al.  Shape Augmented Regression Method for Face Alignment , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[22]  Timothy F. Cootes,et al.  Multi-view Constrained Local Models for Large Head Angle Facial Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[23]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Fernando De la Torre,et al.  Global supervised descent method , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Georgios Tzimiropoulos,et al.  Project-Out Cascaded Regression with an application to face alignment , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ashraf A. Kassim,et al.  Facial Landmark Detection via Progressive Initialization , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[27]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Václav Hlavác,et al.  Facial Landmark Tracking by Tree-Based Deformable Part Model Based Detector , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[29]  Qingshan Liu,et al.  Facial Shape Tracking via Spatio-Temporal Cascade Shape Regression , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Shiguang Shan,et al.  Occlusion-Free Face Alignment: Deep Regression Networks Coupled with De-Corrupt AutoEncoders , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Qingshan Liu,et al.  M3 CSR: Multi-view, multi-scale and multi-component cascade shape regression , 2016, Image Vis. Comput..

[33]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Michel F. Valstar,et al.  Cascaded Continuous Regression for Real-time Incremental Face Tracking , 2016, ECCV.

[35]  Allan Jabri,et al.  Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.

[36]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[37]  George Trigeorgis,et al.  Mnemonic Descent Method: A Recurrent Process Applied for End-to-End Face Alignment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ira Kemelmacher-Shlizerman,et al.  The MegaFace Benchmark: 1 Million Faces for Recognition at Scale , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Stefanos Zafeiriou,et al.  300 Faces In-The-Wild Challenge: database and results , 2016, Image Vis. Comput..

[40]  Rogério Schmidt Feris,et al.  A Recurrent Encoder-Decoder Network for Sequential Face Alignment , 2016, ECCV.

[41]  Haoqiang Fan,et al.  Approaching human level facial landmark localization by deep learning , 2016, Image Vis. Comput..

[42]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Stefanos Zafeiriou,et al.  A Comprehensive Performance Evaluation of Deformable Face Tracking “In-the-Wild” , 2016, International Journal of Computer Vision.