Ordered Weighted Aggregation Networks for Video Face Recognition

Abstract Video face recognition generally includes a step where all descriptors extracted for each frame are aggregated to generate a single video face representation. The most commonly used operator for aggregation is average, which gives the same relevance to each frame. Some adaptive aggregation algorithms have been developed, but most of them rely on the use of weighted mean as aggregation operator, thus disregarding many other types of aggregation operators. In this paper, we propose a novel adaptive aggregation scheme based on ordered weighted average (OWA) operators in contrast with the mainly used weighted mean scheme. Furthermore, besides presenting the theoretical aspects of our aggregation scheme, we develop two different concrete implementations to validate its suitability for video face recognition: Ordered weighted aggregation network (OWANet) and Weighted OWANet (WOWANet). Both algorithms are based on neural networks and are trainable through gradient descent in a classic supervised learning way. We conduct extensive experiments on YouTube Faces, COX Face and the IARPA Janus Benchmark A for evaluating recognition performance on verification and identification tasks. The experimentation process shows that both proposals achieve very competitive results in accuracy with respect to the existent state-of-the-art methods, while significantly reducing space and inference time.

[1]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[2]  Yamel Pérez Guadarramas,et al.  Applying OWA Operator in the Semantic Processing for Automatic Keyphrase Extraction , 2019, CIARP.

[3]  Shiguo Lian,et al.  Fine-grained Attention-based Video Face Recognition , 2019, ArXiv.

[4]  Tharam S. Dillon,et al.  Using Fuzzy Linguistic Representations to Provide Explanatory Semantics for Data Warehouses , 2003, IEEE Trans. Knowl. Data Eng..

[5]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[7]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[8]  Anil K. Jain,et al.  Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Gérard G. Medioni,et al.  Pose-Aware Face Recognition in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[11]  Tal Hassner,et al.  Do We Really Need to Collect Millions of Faces for Effective Face Recognition? , 2016, ECCV.

[12]  Yu Liu,et al.  Quality Aware Network for Set to Set Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Shiguang Shan,et al.  A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database , 2015, IEEE Transactions on Image Processing.

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  Ajmal S. Mian,et al.  Sparse approximated nearest points for image set classification , 2011, CVPR 2011.

[16]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Carlos D. Castillo,et al.  Triplet probabilistic embedding for face verification and clustering , 2016, 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[18]  Alfredo Simón-Cuevas,et al.  A Fuzzy Approach for Sentences Relevance Assessment in Multi-document Summarization , 2019, SOCO.

[19]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[20]  Sixue Gong,et al.  Recurrent Embedding Aggregation Network for Video Face Recognition , 2019, ArXiv.

[21]  Jongmoo Choi,et al.  Pooling Faces: Template Based Face Recognition with Pooled Face Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  David J. Kriegman,et al.  Video-based face recognition using probabilistic appearance manifolds , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[24]  Guodong Guo,et al.  A survey on deep learning based face recognition , 2019, Comput. Vis. Image Underst..

[25]  Sixue Gong,et al.  Low Quality Video Face Recognition: Multi-Mode Aggregation Recurrent Network (MARN) , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[26]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[27]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[28]  Sixue Gong,et al.  Video Face Recognition: Component-wise Feature Aggregation Network (C-FAN) , 2019, 2019 International Conference on Biometrics (ICB).

[29]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[30]  Anil K. Jain,et al.  Improving Face Recognition by Exploring Local Features with Visual Attention , 2018, 2018 International Conference on Biometrics (ICB).

[31]  R. Yager Quantifier guided aggregation using OWA operators , 1996, Int. J. Intell. Syst..

[32]  Gengming Zhu,et al.  Joint Face Detection and Facial Expression Recognition with MTCNN , 2017, 2017 4th International Conference on Information Science and Control Engineering (ICISCE).

[33]  Yang Liu,et al.  MobileFaceNets: Efficient CNNs for Accurate Real-time Face Verification on Mobile Devices , 2018, CCBR.

[34]  Jiwen Lu,et al.  Attention-Aware Deep Reinforcement Learning for Video Face Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Jiwen Lu,et al.  Learning Discriminative Aggregation Network for Video-Based Face Recognition and Person Re-identification , 2017, International Journal of Computer Vision.

[37]  Dimitar Filev,et al.  Induced ordered weighted averaging operators , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[38]  Annette Morales-González,et al.  Evaluación de la calidad de las imágenes de rostros utilizadas para la identificación de las personas , 2012, Computación y Sistemas.

[39]  V. D. Ambeth Kumar,et al.  A Survey on Face Recognition in Video Surveillance , 2018 .

[40]  Andrew Zisserman,et al.  GhostVLAD for set-based face recognition , 2018, ACCV.

[41]  Dongqing Zhang,et al.  Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).