论文信息 - Video Face Recognition: Component-wise Feature Aggregation Network (C-FAN)

Video Face Recognition: Component-wise Feature Aggregation Network (C-FAN)

We propose a new approach to video face recognition. Our component-wise feature aggregation network (C-FAN) accepts a set of face images of a subject as an input, and outputs a single feature vector as the face representation of the set for the recognition task. The whole network is trained in two steps: (i) train a base CNN for still image face recognition; (ii) add an aggregation module to the base network to learn the quality value for each feature component, which adaptively aggregates deep feature vectors into a single vector to represent the face in a video. C-FAN automatically learns to retain salient face features with high quality scores while suppressing features with low quality scores. The experimental results on three benchmark datasets, YouTube Faces [39], IJB-A [13], and IJB-S [12] show that the proposed C-FAN network is capable of generating a compact feature vector with 512 dimensions for a video sequence by efficiently aggregating feature vectors of all the video frames to achieve state of the art performance.

[1] Gang Wang,et al. Image Set Classification Using Holistic Multiple Order Statistics Features and Localized Multi-kernel Metric Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[2] Yu Qiao,et al. A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[3] Lei Zhang,et al. Face recognition based on regularized nearest points between image sets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[4] Tal Hassner,et al. Do We Really Need to Collect Millions of Faces for Effective Face Recognition? , 2016, ECCV.

[5] Yu Qiao,et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[6] Dongqing Zhang,et al. Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Carlos D. Castillo,et al. Crystal Loss and Quality Pooling for Unconstrained Face Verification and Recognition , 2018, ArXiv.

[8] Venu Govindaraju,et al. Metadata-Based Feature Aggregation Network for Face Recognition , 2018, 2018 International Conference on Biometrics (ICB).

[9] Xiaogang Wang,et al. Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Yu Liu,et al. Quality Aware Network for Set to Set Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] David J. Kriegman,et al. Video-based face recognition using probabilistic appearance manifolds , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12] Gérard G. Medioni,et al. Pose-Aware Face Recognition in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Carlos D. Castillo,et al. L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[14] Xing Ji,et al. CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Shiguang Shan,et al. Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification , 2015, ICML.

[16] Jiwen Lu,et al. Learning Discriminative Aggregation Network for Video-Based Face Recognition and Person Re-identification , 2017, International Journal of Computer Vision.

[17] Anil K. Jain,et al. Improving Face Recognition by Exploring Local Features with Visual Attention , 2018, 2018 International Conference on Biometrics (ICB).

[18] Anil K. Jain,et al. Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Tal Hassner,et al. Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[20] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Kihyuk Sohn,et al. Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[22] Likun Huang,et al. Face recognition based on image sets , 2014 .

[23] Carlos D. Castillo,et al. Unconstrained Still/Video-Based Face Verification with Deep Convolutional Neural Networks , 2016, International Journal of Computer Vision.

[24] Liming Chen,et al. DeepVisage: Making Face Recognition Simple Yet With Powerful Generalization Skills , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[25] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Brian C. Lovell,et al. Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching , 2011, CVPR 2011.

[27] Anil K. Jain,et al. Unconstrained Face Recognition: Identifying a Person of Interest From a Media Collection , 2014, IEEE Transactions on Information Forensics and Security.

[28] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29] Trevor Darrell,et al. Face recognition with image sets using manifold density divergence , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30] Jongmoo Choi,et al. Pooling Faces: Template Based Face Recognition with Pooled Face Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31] Carlos D. Castillo,et al. Triplet probabilistic embedding for face verification and clustering , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[32] Xiaogang Wang,et al. Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[33] Anil K. Jain,et al. Face Search at Scale , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Bhiksha Raj,et al. SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Anil K. Jain,et al. IJB–S: IARPA Janus Surveillance Video Benchmark , 2018, 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[36] Trevor Darrell,et al. Face Recognition from Long-Term Observations , 2002, ECCV.

[37] Yuxiao Hu,et al. MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[38] Jian Cheng,et al. Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[39] Dacheng Tao,et al. Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40] Chang Huang,et al. Targeting Ultimate Accuracy: Face Recognition via Deep Embedding , 2015, ArXiv.

[41] Qiong Cao,et al. Template Adaptation for Face Verification and Identification , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[42] Gang Hua,et al. Eigen-PEP for Video Face Recognition , 2014, ACCV.