Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification

Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Due to the large intra-class variations and cross-modality discrepancy with large amount of sample noise, it is difficult to learn discriminative part features. Existing VI-ReID methods instead tend to learn global representations, which have limited discriminability and weak robustness to noisy images. In this paper, we propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID. We propose an intra-modality weighted-part attention module to extract discriminative part-aggregated features, by imposing the domain knowledge on the part relationship mining. To enhance robustness against noisy samples, we introduce cross-modality graph structured attention to reinforce the representation with the contextual relations across the two modalities. We also develop a parameter-free dynamic dual aggregation learning strategy to adaptively integrate the two components in a progressive joint training manner. Extensive experiments demonstrate that DDAG outperforms the state-of-the-art methods under various settings.

[1]  Ling Shao,et al.  Visible-Infrared Person Re-Identification via Homogeneous Augmented Tri-Modal Learning , 2021, IEEE Transactions on Information Forensics and Security.

[2]  Tao Xiang,et al.  Deep Learning for Person Re-Identification: A Survey and Outlook , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Kim-Hui Yap,et al.  AANet: Attribute Attention Network for Person Re-Identifications , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Pong C. Yuen,et al.  Joint Discriminative Learning of Deep Dynamic Textures for 3D Mask Face Anti-Spoofing , 2019, IEEE Transactions on Information Forensics and Security.

[5]  Qiquan Shi,et al.  Iterative Dynamic Generic Learning For Single Sample Face Recognition With A Contaminated Gallery , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[6]  Pong C. Yuen,et al.  Bi-Directional Center-Constrained Top-Ranking for Visible Thermal Person Re-Identification , 2020, IEEE Transactions on Information Forensics and Security.

[7]  Yang Yang,et al.  RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Sivaram Prasad Mudunuri,et al.  Dictionary Alignment With Re-Ranking for Low-Resolution NIR-VIS Face Recognition , 2019, IEEE Transactions on Information Forensics and Security.

[9]  Pong C. Yuen,et al.  Multi-Adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Wei Li,et al.  Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Mustafa E. Kamasak,et al.  An Efficient Framework for Visible-Infrared Cross Modality Person Re-Identification , 2019, Signal Process. Image Commun..

[12]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[13]  Wenjun Zeng,et al.  Beyond Intra-modality: A Survey of Heterogeneous Person Re-identification , 2019, International Joint Conference on Artificial Intelligence.

[14]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[15]  Jian Sun,et al.  AlignedReID: Surpassing Human-Level Performance in Person Re-Identification , 2017, ArXiv.

[16]  Xiaogang Wang,et al.  Diversity Regularized Spatiotemporal Attention for Video-Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Shiguang Shan,et al.  Interaction-And-Aggregation Network for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Wei Jiang,et al.  A Strong Baseline and Batch Normalization Neck for Deep Person Re-Identification , 2019, IEEE Transactions on Multimedia.

[19]  Pong C. Yuen,et al.  Hierarchical Discriminative Learning for Visible Thermal Person Re-Identification , 2018, AAAI.

[20]  Jian-Huang Lai,et al.  RGB-Infrared Cross-Modality Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[23]  Jingdong Wang,et al.  Deeply-Learned Part-Aligned Representations for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Rongrong Ji,et al.  Cross-Modality Person Re-Identification with Generative Adversarial Training , 2018, IJCAI.

[25]  Qi Tian,et al.  Beyond Part Models: Person Retrieval with Refined Part Pooling , 2017, ECCV.

[26]  Shiguang Shan,et al.  VRSTC: Occlusion-Free Video Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Gang Wang,et al.  Dual Attention Matching Network for Context-Aware Feature Sequence Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Xiaogang Wang,et al.  Identity-Aware Textual-Visual Matching with Latent Co-attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Mang Ye,et al.  Augmentation Invariant and Instance Spreading Feature for Softmax Embedding , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yiu-ming Cheung,et al.  Synergistic Generic Learning for Face Recognition From a Contaminated Single Sample per Person , 2020, IEEE Transactions on Information Forensics and Security.

[31]  Jianhuang Lai,et al.  Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification , 2020, IEEE Transactions on Image Processing.

[32]  Xinbo Gao,et al.  Re-Ranking High-Dimensional Deep Local Representation for NIR-VIS Face Recognition , 2019, IEEE Transactions on Image Processing.

[33]  Yu-Chiang Frank Wang,et al.  Coupled Dictionary and Feature Space Learning with Applications to Cross-Domain Image Synthesis and Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Xiaogang Wang,et al.  Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association , 2018, ECCV.

[35]  Tieniu Tan,et al.  Learning Invariant Deep Representation for NIR-VIS Face Recognition , 2017, AAAI.

[36]  Hao Li,et al.  HPILN: A feature learning framework for cross-modality person re-identification , 2019, IET Image Process..

[37]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[38]  Mang Ye,et al.  A Survey of Open-World Person Re-Identification , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[39]  Christian Poellabauer,et al.  Second-Order Non-Local Attention Networks for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Zhenan Sun,et al.  Disentangled Variational Representation for Heterogeneous Face Recognition , 2018, AAAI.

[41]  Weihong Deng,et al.  Mixed High-Order Attention Network for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Hantao Yao,et al.  Deep Representation Learning With Part Loss for Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[43]  Jie Li,et al.  HSME: Hypersphere Manifold Embedding for Visible Thermal Person Re-Identification , 2019, AAAI.

[44]  Shaogang Gong,et al.  Harmonious Attention Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Yun Fu,et al.  Residual Non-local Attention Networks for Image Restoration , 2019, ICLR.

[46]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[47]  Wei Zhang,et al.  Heated-Up Softmax Embedding , 2018, ArXiv.

[48]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Mang Ye,et al.  Cross-Modality Person Re-Identification via Modality-Aware Collaborative Ensemble Learning , 2020, IEEE Transactions on Image Processing.

[50]  Mang Ye,et al.  Modality-aware Collaborative Learning for Visible Thermal Person Re-Identification , 2019, ACM Multimedia.

[51]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Lars Petersson,et al.  Bilinear Attention Networks for Person Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Jian Sun,et al.  Perceive Where to Focus: Learning Visibility-Aware Part-Level Features for Partial Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Jie Li,et al.  Bayesian Face Sketch Synthesis , 2017, IEEE Transactions on Image Processing.

[55]  Xiong Chen,et al.  Learning Discriminative Features with Multiple Granularities for Person Re-Identification , 2018, ACM Multimedia.

[56]  Xuelong Li,et al.  Hierarchical Shot Detector , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Tieniu Tan,et al.  Coupled Deep Learning for Heterogeneous Face Recognition , 2017, AAAI.

[58]  Longin Jan Latecki,et al.  Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Tien Dat Nguyen,et al.  Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras , 2017, Sensors.

[60]  Yung-Yu Chuang,et al.  Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  M. Saquib Sarfraz,et al.  Deep Perceptual Mapping for Cross-Modal Face Recognition , 2016, International Journal of Computer Vision.

[62]  Xiaogang Wang,et al.  HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[63]  Qiquan Shi,et al.  Iterative Dynamic Generic Learning for Face Recognition From a Contaminated Single-Sample Per Person , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[64]  Guocong Song,et al.  Collaborative Learning for Deep Neural Networks , 2018, NeurIPS.

[65]  Shao-Yi Chien,et al.  Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification , 2019, BMVC.

[66]  Jian Cheng,et al.  Enhancing the Discriminative Feature Learning for Visible-Thermal Cross-Modality Person Re-Identification , 2019, Neurocomputing.

[67]  Kaiqi Huang,et al.  Towards Rich Feature Discovery With Class Activation Maps Augmentation for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Xuan Qi,et al.  HPILN: a feature learning framework for cross-modality person re-identification , 2019 .

[69]  Xiaopeng Hong,et al.  Infrared-Visible Cross-Modal Person Re-Identification with an X Modality , 2020, AAAI.

[70]  Shin'ichi Satoh,et al.  Illumination-Adaptive Person Re-identification , 2019, IEEE Transactions on Multimedia.

[71]  Rongrong Ji,et al.  Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).