Joint Registration and Representation Learning for Unconstrained Face Identification

Recent advances in deep learning have resulted in human-level performances on popular unconstrained face datasets including Labeled Faces in the Wild and YouTube Faces. To further advance research, IJB-A benchmark was recently introduced with more challenges especially in the form of extreme head poses. Registration of such faces is quite demanding and often requires laborious procedures like facial landmark localization. In this paper, we propose a Convolutional Neural Networks based data-driven approach which learns to simultaneously register and represent faces. We validate the proposed scheme on template based unconstrained face identification. Here, a template contains multiple media in the form of images and video frames. Unlike existing methods which synthesize all template media information at feature level, we propose to keep the template media intact. Instead, we represent gallery templates by their trained one-vs-rest discriminative models and then employ a Bayesian strategy which optimally fuses decisions of all medias in a query template. We demonstrate the efficacy of the proposed scheme on IJB-A, YouTube Celebrities and COX datasets where our approach achieves significant relative performance boosts of 3.6%, 21.6% and 12.8% respectively.

[1]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Ken-ichi Maeda,et al.  Face recognition using temporal image sequence , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[3]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Mohammed Bennamoun,et al.  Reverse Training: An Efficient Approach for Image Set Classification , 2014, ECCV.

[5]  Gang Hua,et al.  Labeled Faces in the Wild: A Survey , 2016 .

[6]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Hyun-Chul Kim,et al.  Bayesian Classifier Combination , 2012, AISTATS.

[8]  Likun Huang,et al.  Face recognition based on image sets , 2014 .

[9]  Hakan Cevikalp,et al.  Face recognition based on image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Mohammed Bennamoun,et al.  Deep Reconstruction Models for Image Set Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Lei Zhang,et al.  Face recognition based on regularized nearest points between image sets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[13]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[14]  Wen Gao,et al.  Manifold-Manifold Distance with application to face recognition based on image set , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Stephen J. Roberts,et al.  Dynamic Bayesian Combination of Multiple Imperfect Classifiers , 2012, Decision Making and Imperfection.

[16]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[17]  Anil K. Jain,et al.  Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Swami Sankaranarayanan,et al.  Triplet Similarity Embedding for Face Verification , 2016, ArXiv.

[20]  Shiguang Shan,et al.  Benchmarking Still-to-Video Face Recognition via Partial and Local Linear Discriminant Analysis on COX-S2V Dataset , 2012, ACCV.

[21]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[22]  Mubarak Shah,et al.  Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Wen Gao,et al.  Face recognition using Ada-Boosted Gabor features , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[24]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Anil K. Jain,et al.  Face Search at Scale , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Mohammed Bennamoun,et al.  Empowering Simple Binary Classifiers for Image Set Based Face Recognition , 2017, International Journal of Computer Vision.

[27]  Jun-Cheng Chen,et al.  An End-to-End System for Unconstrained Face Verification with Deep Convolutional Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[28]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[29]  Ruiping Wang,et al.  Manifold Discriminant Analysis , 2009, CVPR.

[30]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[31]  Mohammed Bennamoun,et al.  A Spatial Layout and Scale Invariant Feature Representation for Indoor Scene Classification , 2015, IEEE Transactions on Image Processing.

[32]  Mohammed Bennamoun,et al.  Learning Non-linear Reconstruction Models for Image Set Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[34]  Mohammed Bennamoun,et al.  A Discriminative Representation of Convolutional Features for Indoor Scene Recognition , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[35]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Subhransu Maji,et al.  One-to-many face recognition with bilinear CNNs , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[37]  Ajmal S. Mian,et al.  Face Recognition Using Sparse Approximated Nearest Points between Image Sets , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  David Zhang,et al.  From Point to Set: Extend the Learning of Distance Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Qiong Cao,et al.  Template Adaptation for Face Verification and Identification , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[40]  Jongmoo Choi,et al.  Pooling Faces: Template Based Face Recognition with Pooled Face Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[41]  Ramakant Nevatia,et al.  Face recognition using deep multi-pose representations , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[42]  Rama Chellappa,et al.  Unconstrained face verification using deep CNN features , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[43]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.