Recurrent Embedding Aggregation Network for Video Face Recognition

Recurrent networks have been successful in analyzing temporal data and have been widely used for video analysis. However, for video face recognition, where the base CNNs trained on large-scale data already provide discriminative features, using Long Short-Term Memory (LSTM), a popular recurrent network, for feature learning could lead to overfitting and degrade the performance instead. We propose a Recurrent Embedding Aggregation Network (REAN) for set to set face recognition. Compared with LSTM, REAN is robust against overfitting because it only learns how to aggregate the pre-trained embeddings rather than learning representations from scratch. Compared with quality-aware aggregation methods, REAN can take advantage of the context information to circumvent the noise introduced by redundant video frames. Empirical results on three public domain video face recognition datasets, IJB-S, YTF, and PaSC show that the proposed REAN significantly outperforms naive CNN-LSTM structure and quality-aware aggregation methods.

[1]  Jiwen Lu,et al.  Attention-Aware Deep Reinforcement Learning for Video Face Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Brian C. Lovell,et al.  Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching , 2011, CVPR 2011.

[3]  Li Shen,et al.  Comparator Networks , 2018, ECCV.

[4]  Christopher Joseph Pal,et al.  Recurrent Neural Networks for Emotion Recognition in Video , 2015, ICMI.

[5]  Eric Granger,et al.  Using deep autoencoders to learn robust domain-invariant representations for still-to-video face recognition , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[6]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ajmal S. Mian,et al.  Sparse approximated nearest points for image set classification , 2011, CVPR 2011.

[8]  Erfan Mohagheghian,et al.  An application of evolutionary algorithms for WAG optimisation in the Norne Field , 2016 .

[9]  Gang Hua,et al.  Eigen-PEP for Video Face Recognition , 2014, ACCV.

[10]  Tong Zhang,et al.  Spatial–Temporal Recurrent Neural Network for Emotion Recognition , 2017, IEEE Transactions on Cybernetics.

[11]  Gang Wang,et al.  Image Set Classification Using Holistic Multiple Order Statistics Features and Localized Multi-kernel Metric Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Stephanie Schuckers,et al.  CNN based key frame extraction for face in video recognition , 2018, 2018 IEEE 4th International Conference on Identity, Security, and Behavior Analysis (ISBA).

[13]  Chao Yang,et al.  Dependency-Aware Attention Control for Unconstrained Face Recognition with Image Sets , 2018, ECCV.

[14]  Anil K. Jain,et al.  Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Lei Zhang,et al.  Face recognition based on regularized nearest points between image sets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[16]  Wen Gao,et al.  Manifold-Manifold Distance with application to face recognition based on image set , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[18]  Dongqing Zhang,et al.  Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  D. P. DEE,et al.  Bias and data assimilation , 2005 .

[20]  Anil K. Jain,et al.  Unconstrained Face Recognition: Identifying a Person of Interest From a Media Collection , 2014, IEEE Transactions on Information Forensics and Security.

[21]  Fang Zhao,et al.  Multi-Prototype Networks for Unconstrained Set-based Face Recognition , 2019, IJCAI.

[22]  Carlos D. Castillo,et al.  The Do’s and Don’ts for CNN-Based Face Verification , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[23]  Trevor Darrell,et al.  Face recognition with image sets using manifold density divergence , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Luc Van Gool,et al.  A Riemannian Network for SPD Matrix Learning , 2016, AAAI.

[26]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[27]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[28]  Sixue Gong,et al.  Video Face Recognition: Component-wise Feature Aggregation Network (C-FAN) , 2019, 2019 International Conference on Biometrics (ICB).

[29]  Yu Liu,et al.  Quality Aware Network for Set to Set Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Shiguang Shan,et al.  Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification , 2015, ICML.

[31]  Carlos D. Castillo,et al.  Crystal Loss and Quality Pooling for Unconstrained Face Verification and Recognition , 2018, ArXiv.

[32]  Bruce A. Draper,et al.  The challenge of face recognition from digital point-and-shoot cameras , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[33]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Carlos D. Castillo,et al.  Unconstrained Still/Video-Based Face Verification with Deep Convolutional Neural Networks , 2016, International Journal of Computer Vision.

[35]  Yong Ren,et al.  Pose invariant face recognition using Cellular Simultaneous Recurrent Networks , 2009, 2009 International Joint Conference on Neural Networks.

[36]  Ming-Hsuan Yang,et al.  Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Jiwen Lu,et al.  Learning Discriminative Aggregation Network for Video-Based Face Recognition and Person Re-identification , 2017, International Journal of Computer Vision.

[38]  Jan Kautz,et al.  Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[40]  Jongmoo Choi,et al.  Pooling Faces: Template Based Face Recognition with Pooled Face Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[41]  Trevor Darrell,et al.  Face Recognition from Long-Term Observations , 2002, ECCV.

[42]  Luc Van Gool,et al.  Building Deep Networks on Grassmann Manifolds , 2016, AAAI.

[43]  Jürgen Schmidhuber,et al.  Facial Expression Recognition with Recurrent Neural Networks , 2008 .

[45]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Anil K. Jain,et al.  IJB–S: IARPA Janus Surveillance Video Benchmark , 2018, 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[47]  Andrew Zisserman,et al.  Multicolumn Networks for Face Recognition , 2018, BMVC.

[48]  Dacheng Tao,et al.  Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Hakan Cevikalp,et al.  Face recognition based on image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Andrew Zisserman,et al.  GhostVLAD for set-based face recognition , 2018, ACCV.

[51]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[52]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[53]  David J. Kriegman,et al.  Video-based face recognition using probabilistic appearance manifolds , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..