Uncertainty Modeling of Contextual-Connections Between Tracklets for Unconstrained Video-Based Face Recognition

Unconstrained video-based face recognition is a challenging problem due to significant within-video variations caused by pose, occlusion and blur. To tackle this problem, an effective idea is to propagate the identity from high-quality faces to low-quality ones through contextual connections, which are constructed based on context such as body appearance. However, previous methods have often propagated erroneous information due to lack of uncertainty modeling of the noisy contextual connections. In this paper, we propose the Uncertainty-Gated Graph (UGG), which conducts graph-based identity propagation between tracklets, which are represented by nodes in a graph. UGG explicitly models the uncertainty of the contextual connections by adaptively updating the weights of the edge gates according to the identity distributions of the nodes during inference. UGG is a generic graphical model that can be applied at only inference time or with end-to-end training. We demonstrate the effectiveness of UGG with state-of-the-art results in the recently released challenging Cast Search in Movies and IARPA Janus Surveillance Video Benchmark dataset.

[1]  Yu-Chiang Frank Wang,et al.  Multi-label Zero-Shot Learning with Structured Knowledge Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[3]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Anil K. Jain,et al.  IARPA Janus Benchmark-B Face Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Rama Chellappa,et al.  Face Association across Unconstrained Video Frames Using Conditional Random Fields , 2012, ECCV.

[6]  Dacheng Tao,et al.  Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  C. V. Jawahar,et al.  Face Recognition in Videos by Label Propagation , 2014, 2014 22nd International Conference on Pattern Recognition.

[8]  Carlos D. Castillo,et al.  Unconstrained Still/Video-Based Face Verification with Deep Convolutional Neural Networks , 2016, International Journal of Computer Vision.

[9]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[10]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[11]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[12]  Abhinav Gupta,et al.  Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[14]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[16]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[17]  Carlos D. Castillo,et al.  Crystal Loss and Quality Pooling for Unconstrained Face Verification and Recognition , 2018, ArXiv.

[18]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Greg Mori,et al.  Learning Structured Inference Neural Networks with Label Relations , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xiaogang Wang,et al.  Person Re-identification with Deep Similarity-Guided Graph Neural Network , 2018, ECCV.

[22]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[23]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[24]  Sixue Gong,et al.  Recurrent Embedding Aggregation Network for Video Face Recognition , 2019, ArXiv.

[25]  Sixue Gong,et al.  Video Face Recognition: Component-wise Feature Aggregation Network (C-FAN) , 2019, 2019 International Conference on Biometrics (ICB).

[26]  Carlos D. Castillo,et al.  An Automatic System for Unconstrained Video-Based Face Recognition , 2018, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[27]  Anil K. Jain,et al.  IJB–S: IARPA Janus Surveillance Video Benchmark , 2018, 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[28]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[30]  Anil K. Jain,et al.  IARPA Janus Benchmark - C: Face Dataset and Protocol , 2018, 2018 International Conference on Biometrics (ICB).

[31]  Rama Chellappa,et al.  Unconstrained face verification using deep CNN features , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Truong Q. Nguyen,et al.  Detecting temporally consistent objects in videos through object class label propagation , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[34]  Dahua Lin,et al.  Person Search in Videos with One Portrait Through Visual and Temporal Links , 2018, ECCV.

[35]  Juergen Gall,et al.  Real-Time Semantic Segmentation with Label Propagation , 2016, ECCV Workshops.