Unconstrained Still/Video-Based Face Verification with Deep Convolutional Neural Networks

Over the last 5 years, methods based on Deep Convolutional Neural Networks (DCNNs) have shown impressive performance improvements for object detection and recognition problems. This has been made possible due to the availability of large annotated datasets, a better understanding of the non-linear mapping between input images and class labels as well as the affordability of GPUs. In this paper, we present the design details of a deep learning system for unconstrained face recognition, including modules for face detection, association, alignment and face verification. The quantitative performance evaluation is conducted using the IARPA Janus Benchmark A (IJB-A), the JANUS Challenge Set 2 (JANUS CS2), and the Labeled Faces in the Wild (LFW) dataset. The IJB-A dataset includes real-world unconstrained faces of 500 subjects with significant pose and illumination variations which are much harder than the LFW and Youtube Face datasets. JANUS CS2 is the extended version of IJB-A which contains not only all the images/frames of IJB-A but also includes the original videos. Some open issues regarding DCNNs for face verification problems are then discussed.

[1]  Junjie Yan,et al.  Face detection by structural models , 2014, Image Vis. Comput..

[2]  Anil K. Jain,et al.  Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Swami Sankaranarayanan,et al.  Triplet Similarity Embedding for Face Verification , 2016, ArXiv.

[4]  Jean-Marc Odobez,et al.  Track Creation and Deletion Framework for Long-Term Online Multiface Tracking , 2013, IEEE Transactions on Image Processing.

[5]  Jianguo Li,et al.  Learning SURF Cascade for Fast and Accurate Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[7]  Jian Sun,et al.  Joint Cascade Face Detection and Alignment , 2014, ECCV.

[8]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[9]  Tal Hassner,et al.  Effective face frontalization in unconstrained images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Rama Chellappa,et al.  Unconstrained face verification using deep CNN features , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Kuk-Jin Yoon,et al.  Robust Online Multi-object Tracking Based on Tracklet Confidence and Online Discriminative Appearance Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[15]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[17]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Rama Chellappa,et al.  Face Association across Unconstrained Video Frames Using Conditional Random Fields , 2012, ECCV.

[19]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Xiaogang Wang,et al.  DeepID3: Face Recognition with Very Deep Neural Networks , 2015, ArXiv.

[22]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[23]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Frédéric Jurie,et al.  PCCA: A new approach for distance learning from sparse pairwise constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Stefanos Zafeiriou,et al.  Robust Discriminative Response Map Fitting with Constrained Local Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Ramakant Nevatia,et al.  Face recognition using deep multi-pose representations , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  Rui Caseiro,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence High-speed Tracking with Kernelized Correlation Filters , 2022 .

[28]  Carlos D. Castillo,et al.  Deep Heterogeneous Feature Fusion for Template-Based Face Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[29]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Subhransu Maji,et al.  One-to-many face recognition with bilinear CNNs , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[35]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Tal Hassner,et al.  The One-Shot similarity kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[38]  Timothy F. Cootes,et al.  Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.

[39]  Anil K. Jain,et al.  Unconstrained face detection: State of the art baseline and challenges , 2015, 2015 International Conference on Biometrics (ICB).

[40]  Sander Stuijk,et al.  Online multi-face detection and tracking using detector confidence and structured SVMs , 2015, 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[41]  Tal Hassner,et al.  Do We Really Need to Collect Millions of Faces for Effective Face Recognition? , 2016, ECCV.

[42]  Luc Van Gool,et al.  Robust tracking-by-detection using a detector confidence particle filter , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  David K. Smith Network Flows: Theory, Algorithms, and Applications , 1994 .

[44]  Jian Sun,et al.  Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Tal Hassner,et al.  Multiple One-Shots for Utilizing Class Label Information , 2009, BMVC.

[46]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[48]  Jitendra Malik,et al.  Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Rama Chellappa,et al.  A deep pyramid Deformable Part Model for face detection , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[50]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[51]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[52]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Li Bai,et al.  Cosine Similarity Metric Learning for Face Verification , 2010, ACCV.

[54]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[55]  Gang Hua,et al.  Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[56]  Cheng Li,et al.  Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Rama Chellappa,et al.  Face Alignment by Local Deep Descriptor Regression , 2016, ArXiv.

[58]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Qiang Ji,et al.  Robust Face Tracking via Collaboration of Generic and Specific Models , 2008, IEEE Transactions on Image Processing.

[60]  René Vidal,et al.  Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.

[61]  Andrew Zisserman,et al.  Fisher Vector Faces in the Wild , 2013, BMVC.

[62]  Ming-Hsuan Yang,et al.  Bayesian Multi-object Tracking Using Motion Context from Multiple Objects , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[63]  Carlos D. Castillo,et al.  Triplet probabilistic embedding for face verification and clustering , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[64]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[65]  Lin Xiong,et al.  A Good Practice Towards Top Performance of Face Recognition: Transferred Deep Feature Fusion , 2017, ArXiv.

[66]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[67]  Bin Yang,et al.  Convolutional Channel Features , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[68]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[70]  Dongqing Zhang,et al.  Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  David J. Kriegman,et al.  Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[72]  Qiong Cao,et al.  Template Adaptation for Face Verification and Identification , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[73]  Stéphane Mallat,et al.  Understanding deep convolutional networks , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[74]  Jian Sun,et al.  Bayesian Face Revisited: A Joint Formulation , 2012, ECCV.

[75]  Rama Chellappa,et al.  Dictionary-Based Face Recognition from Video , 2012, ECCV.

[76]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Swami Sankaranarayanan,et al.  Unconstrained face verification using fisher vectors computed from frontalized faces , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[78]  L. Darrell Whitley,et al.  Adaptive Appearance Model and Condensation Algorithm for Robust Face Tracking , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[79]  Carlos D. Castillo,et al.  L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[80]  Carlos D. Castillo,et al.  An All-In-One Convolutional Neural Network for Face Analysis , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[81]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[82]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[83]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[84]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[85]  Ramakant Nevatia,et al.  Robust multi-pose face tracking by multi-stage tracklet association , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[86]  Li-Jia Li,et al.  Multi-view Face Detection Using Deep Convolutional Neural Networks , 2015, ICMR.

[87]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[88]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[89]  Jun-Cheng Chen,et al.  An End-to-End System for Unconstrained Face Verification with Deep Convolutional Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[90]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[91]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[92]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[93]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[94]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[95]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[96]  Guillermo Sapiro,et al.  On the Stability of Deep Networks , 2014, ICLR.

[97]  Anil K. Jain,et al.  Face Search at Scale: 80 Million Gallery , 2015, ArXiv.

[98]  Shuo Yang,et al.  From Facial Parts Responses to Face Detection: A Deep Learning Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[99]  Maja Pantic,et al.  Gauss-Newton Deformable Part Models for Face Alignment In-the-Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[100]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[101]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[102]  Shengcai Liao,et al.  A Fast and Accurate Unconstrained Face Detector , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[103]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[104]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[105]  Dacheng Tao,et al.  Robust Face Recognition via Multimodal Deep Face Representation , 2015, IEEE Transactions on Multimedia.