论文信息 - Adversarial Embedding and Variational Aggregation for Video Face Recognition

Adversarial Embedding and Variational Aggregation for Video Face Recognition

Video face recognition is a challenging recognition problem due to low-quality and redundant video data. In order to efficiently address the problem, this paper proposes an adversarial embedding and variational aggregation (AEVA) approach that takes discriminative information from adversarial learning to implicitly constrain the distributions of video features. AEVA contains two parts: adversarial embedding learning and variational aggregation learning. The former contributes to discriminative feature embedding space by a self-adversarial process. It reduces intra-class distance of the same subject in an adversarial way. The latter aims to improve aggregated video representation by forcing latent feature distribution to be as close to real feature distribution as possible. An attentional weighting network and a variational inference structure are used to aggregate the features from one video and generate the latent features. Both parts have neither complex sampling strategies nor the hyperparameter settings from deep metric learning. Our approach achieves state-of-the-art performance for video face recognition on four widely used benchmarks, including YouTubeFace, IJB-A, YouTube Celebrities and Celebrity-1000.

Zhenan Sun | Ran He | Bing Yu | Yibo Hu | Lingxiao Song

[1] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[2] Xiaogang Wang,et al. Sparsifying Neural Network Connections for Face Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Dongqing Zhang,et al. Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Ming Shao,et al. Collaborative Random Faces-Guided Encoders for Pose-Invariant Face Representation Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5] Ran He,et al. Learning Gabor Magnitude Features for Palmprint Recognition , 2007, ACCV.

[6] Dacheng Tao,et al. Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Carlos D. Castillo,et al. The Do’s and Don’ts for CNN-Based Face Verification , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[8] Shuicheng Yan,et al. Toward Large-Population Face Identification in Unconstrained Videos , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[9] Xiaogang Wang,et al. Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Larry S. Davis,et al. Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Learning Discriminative Aggregation Network for Video Face Recognition , 2017 .

[12] Yu Qiao,et al. A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[13] Xilin Chen,et al. Projection Metric Learning on Grassmann Manifold with Application to Video based Face Recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Anil K. Jain,et al. Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Vladimir Pavlovic,et al. Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Tal Hassner,et al. Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[17] Ming-Hsuan Yang,et al. Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18] Ajmal S. Mian,et al. Sparse approximated nearest points for image set classification , 2011, CVPR 2011.

[19] Gang Hua,et al. Eigen-PEP for Video Face Recognition , 2014, ACCV.

[20] Shiguang Shan,et al. Discriminant analysis on Riemannian manifold of Gaussian distributions for face recognition with image sets , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Tieniu Tan,et al. A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[22] Omkar M. Parkhi,et al. VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[23] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Bernard Ghanem,et al. Representation learning with deep extreme learning machines for efficient image set classification , 2016, Neural Computing and Applications.

[25] Shiguang Shan,et al. Discriminative Covariance Oriented Representation Learning for Face Recognition with Image Sets , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Bo Jiang,et al. Image set representation and classification with covariate-relation graph , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[27] Ran He,et al. Online Determination of Track Loss Using Template Inverse Matching , 2008 .

[28] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[29] Qiong Cao,et al. Template Adaptation for Face Verification and Identification , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[30] Tieniu Tan,et al. Simultaneous Feature and Sample Reduction for Image-Set Classification , 2016, AAAI.

[31] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Fang Zhao,et al. Dual-Agent GANs for Photorealistic and Identity Preserving Profile Face Synthesis , 2017, NIPS.

[33] Yu Qiao,et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[34] Yu Liu,et al. Quality Aware Network for Set to Set Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Jiwen Lu,et al. Attention-Aware Deep Reinforcement Learning for Video Face Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).