Dual Adversarial Disentanglement and Deep Representation Decorrelation for NIR-VIS Face Recognition

The task of near-infrared and visual (NIR-VIS) face recognition refers to matching face data from different modalities, which has broad application prospects in areas such as multimedia information retrieval and criminal investigation. However, it remains a challenging task due to high intra-class variations and small-scale NIR-VIS dataset. In this paper, we propose a novel approach called Dual Adversarial Disentanglement and deep Representation Decorrelation (DADRD) to solve the NIR-VIS matching problem. In order to reduce the gap between NIR-VIS images, three key components are designed for DADRD model, including Cross-modal Margin (CmM) loss, Dual Adversarial Disentangled Variations (DADV) and Deep Representation Decorrelation (DRD). Firstly, the CmM loss captures within- and between-class information of the data, and it further reduces modality difference by a center-variation item. Secondly, the Mixed Facial Representation (MFR) layer of the backbone network is divided into three parts: the identity-related layer, the modality-related layer and the residual-related layer. The DADV is designed to reduce the intra-class variations, which consists of Adversarial Disentangled Modality Variations (ADMV) and Adversarial Disentangled Residual Variations (ADRV). Specifically, the ADMV and ADRV aim at eliminating spectrum variations and residual variations (i.e., lighting, pose, expression, occlusion, etc) respectively via an adversarial mechanism. Finally, we impose a DRD on the three decomposed features to make them irrelevant to each other, which can more effectively separate the three component information and enhance feature representations. In particular, we develop a Joint Three-stage Optimization (JTsO) strategy to effectively optimize the network. The joint formulation leads to the purification of identity information and the disentanglement of within-class variation information. Extensive experiments have been carried out on three challenging datasets, and the results demonstrate the effectiveness of our method.

[1]  S. Shan,et al.  VIPLFaceNet: an open source deep face recognition SDK , 2016, Frontiers of Computer Science.

[2]  Jian Sun,et al.  Bayesian Face Revisited: A Joint Formulation , 2012, ECCV.

[3]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[4]  Xinbo Gao,et al.  Dual-Transfer Face Sketch–Photo Synthesis , 2019, IEEE Transactions on Image Processing.

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  Tieniu Tan,et al.  Transferring deep representation for NIR-VIS heterogeneous face recognition , 2016, 2016 International Conference on Biometrics (ICB).

[7]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[9]  Shiguang Shan,et al.  Multi-view Deep Network for Cross-View Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Tieniu Tan,et al.  Learning Invariant Deep Representation for NIR-VIS Face Recognition , 2017, AAAI.

[11]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[12]  Matti Pietikäinen,et al.  Learning mappings for face synthesis from near infrared to visual light images , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Tieniu Tan,et al.  Coupled Deep Learning for Heterogeneous Face Recognition , 2017, AAAI.

[14]  Shiguang Shan,et al.  Multi-View Discriminant Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Stan Z. Li,et al.  Shared representation learning for heterogenous face recognition , 2014, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[16]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[17]  Haifeng Hu,et al.  Heterogeneous Face Recognition Based on Multiple Deep Networks With Scatter Loss and Diversity Combination , 2019, IEEE Access.

[18]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[19]  Xinbo Gao,et al.  Data Augmentation-Based Joint Learning for Heterogeneous Face Recognition , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Man Zhang,et al.  Adversarial Discriminative Heterogeneous Face Recognition , 2017, AAAI.

[21]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[22]  Haifeng Hu,et al.  Adversarial Disentanglement Spectrum Variations and Cross-Modality Attention Networks for NIR-VIS Face Recognition , 2021, IEEE Transactions on Multimedia.

[23]  Rama Chellappa,et al.  Seeing the Forest from the Trees: A Holistic Approach to Near-Infrared Heterogeneous Face Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Shengcai Liao,et al.  The CASIA NIR-VIS 2.0 Face Database , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Haifeng Hu,et al.  Discriminant Deep Feature Learning based on joint supervision Loss and Multi-layer Feature Fusion for heterogeneous face recognition , 2019, Comput. Vis. Image Underst..

[27]  Ran He,et al.  Pose-preserving Cross Spectral Face Hallucination , 2019, IJCAI.

[28]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[29]  Jakob Verbeek,et al.  Heterogeneous Face Recognition with CNNs , 2016, ECCV Workshops.

[30]  Anil K. Jain,et al.  Heterogeneous Face Recognition Using Kernel Prototype Similarities , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[32]  Jiwen Lu,et al.  Simultaneous Local Binary Feature Learning and Encoding for Homogeneous and Heterogeneous Face Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Xuelong Li,et al.  Heterogeneous Face Recognition: A Common Encoding Feature Discriminant Approach , 2017, IEEE Transactions on Image Processing.

[34]  Tieniu Tan,et al.  Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Shengcai Liao,et al.  Coupled Discriminant Analysis for Heterogeneous Face Recognition , 2012, IEEE Transactions on Information Forensics and Security.

[36]  Stan Z. Li,et al.  Coupled Spectral Regression for matching heterogeneous faces , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[38]  Yi Li,et al.  Cross-Spectral Face Hallucination via Disentangling Independent Factors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Tong Zhang,et al.  A Deep Neural Network-Driven Feature Learning Method for Multi-view Facial Expression Recognition , 2016, IEEE Transactions on Multimedia.

[40]  Sébastien Marcel,et al.  Heterogeneous Face Recognition Using Domain Specific Units , 2019, IEEE Transactions on Information Forensics and Security.

[41]  Marios Savvides,et al.  NIR-VIS heterogeneous face recognition via cross-spectral joint dictionary learning and reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[42]  Sivaram Prasad Mudunuri,et al.  Dictionary Alignment With Re-Ranking for Low-Resolution NIR-VIS Face Recognition , 2019, IEEE Transactions on Information Forensics and Security.

[43]  Ming Shao,et al.  Cross-Modality Feature Learning Through Generic Hierarchical Hyperlingual-Words , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Haifeng Hu,et al.  Disentangled Spectrum Variations Networks for NIR–VIS Face Recognition , 2020, IEEE Transactions on Multimedia.

[45]  Dahua Lin,et al.  Inter-modality Face Recognition , 2006, ECCV.