An Automatic System for Unconstrained Video-Based Face Recognition

Although deep learning approaches have achieved performance surpassing humans for still image-based face recognition, unconstrained video-based face recognition is still a challenging task due to large volume of data to be processed and intra/inter-video variations on pose, illumination, occlusion, scene, blur, video quality, etc. In this work, we consider challenging scenarios for unconstrained video-based face recognition from multiple-shot videos and surveillance videos with low-quality frames. To handle these problems, we propose a robust and efficient system for unconstrained video-based face recognition, which is composed of modules for face/fiducial detection, face association, and face recognition. First, we use multi-scale single-shot face detectors to efficiently localize faces in videos. The detected faces are then grouped through carefully designed face association methods, especially for multi-shot videos. Finally, the faces are recognized by the proposed face matcher based on an unsupervised subspace learning approach and a subspace-to-subspace similarity metric. Extensive experiments on challenging video datasets, such as Multiple Biometric Grand Challenge (MBGC), Face and Ocular Challenge Series (FOCS), IARPA Janus Surveillance Video Benchmark (IJB-S) for low-quality surveillance videos and IARPA JANUS Benchmark B (IJB-B) for multiple-shot videos, demonstrate that the proposed system can accurately detect and associate faces from unconstrained videos and effectively learn robust and discriminative features for recognition.

[1]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ramakant Nevatia,et al.  Robust multi-pose face tracking by multi-stage tracklet association , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[4]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Sixue Gong,et al.  Recurrent Embedding Aggregation Network for Video Face Recognition , 2019, ArXiv.

[6]  Sixue Gong,et al.  Video Face Recognition: Component-wise Feature Aggregation Network (C-FAN) , 2019, 2019 International Conference on Biometrics (ICB).

[7]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[8]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Yu Liu,et al.  Quality Aware Network for Set to Set Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Cordelia Schmid,et al.  Unsupervised metric learning for face identification in TV video , 2011, 2011 International Conference on Computer Vision.

[11]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[12]  Carlos D. Castillo,et al.  Deep Learning for Understanding Faces: Machines May Be Just as Good, or Better, than Humans , 2018, IEEE Signal Processing Magazine.

[13]  Carlos D. Castillo,et al.  Video-Based Face Association and Identification , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[14]  Rama Chellappa,et al.  Visual tracking and recognition using appearance-adaptive models in particle filters , 2004, IEEE Transactions on Image Processing.

[15]  Julianne H. Ayyad,et al.  Recognizing people from dynamic and static faces and bodies: Dissecting identity with a fusion approach , 2010, Vision Research.

[16]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[17]  Andrew Zisserman,et al.  Automatic face recognition for film character retrieval in feature-length films , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Anil K. Jain,et al.  IARPA Janus Benchmark-B Face Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[20]  Carlos D. Castillo,et al.  Triplet probabilistic embedding for face verification and clustering , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[21]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[22]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[23]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[24]  Larry S. Davis,et al.  SSH: Single Stage Headless Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Rama Chellappa,et al.  Dictionary-Based Face Recognition from Video , 2012, ECCV.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Ruiping Wang,et al.  Manifold Discriminant Analysis , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Rui Caseiro,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence High-speed Tracking with Kernelized Correlation Filters , 2022 .

[30]  Dacheng Tao,et al.  Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Wen Gao,et al.  Manifold–Manifold Distance and its Application to Face Recognition With Image Sets , 2012, IEEE Transactions on Image Processing.

[32]  Sander Stuijk,et al.  Online multi-face detection and tracking using detector confidence and structured SVMs , 2015, 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[33]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Anil K. Jain,et al.  IARPA Janus Benchmark - C: Face Dataset and Protocol , 2018, 2018 International Conference on Biometrics (ICB).

[35]  Rama Chellappa,et al.  Unconstrained face verification using deep CNN features , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[36]  Rama Chellappa,et al.  Face Association for Videos Using Conditional Random Fields and Max-Margin Markov Networks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Bruce A. Draper,et al.  Overview of the Multiple Biometrics Grand Challenge , 2009, ICB.

[38]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Rama Chellappa,et al.  Dictionary-Based Face and Person Recognition From Unconstrained Video , 2015, IEEE Access.

[40]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[41]  Dongqing Zhang,et al.  Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Carlos D. Castillo,et al.  Crystal Loss and Quality Pooling for Unconstrained Face Verification and Recognition , 2018, ArXiv.

[43]  Carlos D. Castillo,et al.  Hybrid Dictionary Learning and Matching for Video-based Face Verification , 2019, 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[44]  Mubarak Shah,et al.  Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Junjie Yan,et al.  Face detection by structural models , 2014, Image Vis. Comput..

[46]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[48]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[49]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Carlos D. Castillo,et al.  A Fast and Accurate System for Face Detection, Identification, and Verification , 2018, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[51]  Anil K. Jain,et al.  IJB–S: IARPA Janus Surveillance Video Benchmark , 2018, 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[52]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Rama Chellappa,et al.  A Real-Time Multi-Task Single Shot Face Detector , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[54]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Carlos D. Castillo,et al.  UMDFaces: An annotated face dataset for training deep networks , 2016, 2017 IEEE International Joint Conference on Biometrics (IJCB).

[57]  Carlos D. Castillo,et al.  Unconstrained Still/Video-Based Face Verification with Deep Convolutional Neural Networks , 2016, International Journal of Computer Vision.

[58]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[59]  Alice J. O'Toole,et al.  A video database of moving faces and people , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Carlos D. Castillo,et al.  An All-In-One Convolutional Neural Network for Face Analysis , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[61]  Andrew Zisserman,et al.  Person Spotting: Video Shot Retrieval for Face Sets , 2005, CIVR.

[62]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[63]  Anil K. Jain,et al.  Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.