Video Face Recognition: Component-wise Feature Aggregation Network (C-FAN)

We propose a new approach to video face recognition. Our component-wise feature aggregation network (C-FAN) accepts a set of face images of a subject as an input, and outputs a single feature vector as the face representation of the set for the recognition task. The whole network is trained in two steps: (i) train a base CNN for still image face recognition; (ii) add an aggregation module to the base network to learn the quality value for each feature component, which adaptively aggregates deep feature vectors into a single vector to represent the face in a video. C-FAN automatically learns to retain salient face features with high quality scores while suppressing features with low quality scores. The experimental results on three benchmark datasets, YouTube Faces [39], IJB-A [13], and IJB-S [12] show that the proposed C-FAN network is capable of generating a compact feature vector with 512 dimensions for a video sequence by efficiently aggregating feature vectors of all the video frames to achieve state of the art performance.

[1]  Gang Wang,et al.  Image Set Classification Using Holistic Multiple Order Statistics Features and Localized Multi-kernel Metric Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[3]  Lei Zhang,et al.  Face recognition based on regularized nearest points between image sets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[4]  Tal Hassner,et al.  Do We Really Need to Collect Millions of Faces for Effective Face Recognition? , 2016, ECCV.

[5]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[6]  Dongqing Zhang,et al.  Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Carlos D. Castillo,et al.  Crystal Loss and Quality Pooling for Unconstrained Face Verification and Recognition , 2018, ArXiv.

[8]  Venu Govindaraju,et al.  Metadata-Based Feature Aggregation Network for Face Recognition , 2018, 2018 International Conference on Biometrics (ICB).

[9]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yu Liu,et al.  Quality Aware Network for Set to Set Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  David J. Kriegman,et al.  Video-based face recognition using probabilistic appearance manifolds , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Gérard G. Medioni,et al.  Pose-Aware Face Recognition in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Carlos D. Castillo,et al.  L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[14]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Shiguang Shan,et al.  Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification , 2015, ICML.

[16]  Jiwen Lu,et al.  Learning Discriminative Aggregation Network for Video-Based Face Recognition and Person Re-identification , 2017, International Journal of Computer Vision.

[17]  Anil K. Jain,et al.  Improving Face Recognition by Exploring Local Features with Visual Attention , 2018, 2018 International Conference on Biometrics (ICB).

[18]  Anil K. Jain,et al.  Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[20]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[22]  Likun Huang,et al.  Face recognition based on image sets , 2014 .

[23]  Carlos D. Castillo,et al.  Unconstrained Still/Video-Based Face Verification with Deep Convolutional Neural Networks , 2016, International Journal of Computer Vision.

[24]  Liming Chen,et al.  DeepVisage: Making Face Recognition Simple Yet With Powerful Generalization Skills , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[25]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Brian C. Lovell,et al.  Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching , 2011, CVPR 2011.

[27]  Anil K. Jain,et al.  Unconstrained Face Recognition: Identifying a Person of Interest From a Media Collection , 2014, IEEE Transactions on Information Forensics and Security.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Trevor Darrell,et al.  Face recognition with image sets using manifold density divergence , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Jongmoo Choi,et al.  Pooling Faces: Template Based Face Recognition with Pooled Face Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Carlos D. Castillo,et al.  Triplet probabilistic embedding for face verification and clustering , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[32]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[33]  Anil K. Jain,et al.  Face Search at Scale , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Anil K. Jain,et al.  IJB–S: IARPA Janus Surveillance Video Benchmark , 2018, 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[36]  Trevor Darrell,et al.  Face Recognition from Long-Term Observations , 2002, ECCV.

[37]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[38]  Jian Cheng,et al.  Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[39]  Dacheng Tao,et al.  Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Chang Huang,et al.  Targeting Ultimate Accuracy: Face Recognition via Deep Embedding , 2015, ArXiv.

[41]  Qiong Cao,et al.  Template Adaptation for Face Verification and Identification , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[42]  Gang Hua,et al.  Eigen-PEP for Video Face Recognition , 2014, ACCV.