Relation-Aware Global Attention for Person Re-Identification

For person re-identification (re-id), attention mechanisms have become attractive as they aim at strengthening discriminative features and suppressing irrelevant ones, which matches well the key of re-id, i.e., discriminative feature learning. Previous approaches typically learn attention using local convolutions, ignoring the mining of knowledge from global structure patterns. Intuitively, the affinities among spatial positions/nodes in the feature map provide clustering-like information and are helpful for inferring semantics and thus attention, especially for person images where the feasible human poses are constrained. In this work, we propose an effective Relation-Aware Global Attention (RGA) module which captures the global structural information for better attention learning. Specifically, for each feature position, in order to compactly grasp the structural information of global scope and local appearance information, we propose to stack the relations, i.e., its pairwise correlations/affinities with all the feature positions (e.g., in raster scan order), and the feature itself together to learn the attention with a shallow convolutional model. Extensive ablation studies demonstrate that our RGA can significantly enhance the feature representation power and help achieve the state-of-the-art performance on several popular benchmarks. The source code is available at https://github.com/microsoft/Relation-Aware-Global-Attention-Networks.

[1]  Li Fei-Fei,et al.  Neural Graph Matching Networks for Fewshot 3D Action Recognition , 2018, ECCV.

[2]  Yi Yang,et al.  Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jing Xu,et al.  Attention-Aware Compositional Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Michal Irani,et al.  Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Lars Petersson,et al.  Bilinear Attention Networks for Person Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Andrea Cavallaro,et al.  Omni-Scale Feature Learning for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Jingdong Wang,et al.  OCNet: Object Context Network for Scene Parsing , 2018, ArXiv.

[10]  Yunchao Wei,et al.  Horizontal Pyramid Matching for Person Re-identification , 2018, AAAI.

[11]  Gang Wang,et al.  Dual Attention Matching Network for Context-Aware Feature Sequence Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Leon Hirsch,et al.  Super Resolution From A Single Image , 2016 .

[13]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Wenjun Zeng,et al.  Densely Semantically Aligned Person Re-Identification , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Yun Fu,et al.  Residual Non-local Attention Networks for Image Restoration , 2019, ICLR.

[17]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[18]  Shiguang Shan,et al.  Interaction-And-Aggregation Network for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Houqiang Li,et al.  Spatial and Temporal Mutual Promotion for Video-based Person Re-identification , 2018, AAAI.

[21]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[22]  Zhedong Zheng,et al.  Joint Discriminative and Generative Learning for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[24]  Jian Sun,et al.  AlignedReID: Surpassing Human-Level Performance in Person Re-Identification , 2017, ArXiv.

[25]  Tao Mei,et al.  Part-Aligned Bilinear Representations for Person Re-identification , 2018, ECCV.

[26]  Jingdong Wang,et al.  Deeply-Learned Part-Aligned Representations for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Qi Tian,et al.  Beyond Part Models: Person Retrieval with Refined Part Pooling , 2017, ECCV.

[29]  Yinghuan Shi,et al.  MaskReID: A Mask Based Deep Ranking Neural Network for Person Re-identification , 2018, ArXiv.

[30]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[31]  Shaogang Gong,et al.  Harmonious Attention Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Muhittin Gokmen,et al.  Human Semantic Parsing for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Sung Yong Shin,et al.  On pixel-based texture synthesis by non-parametric sampling , 2006, Comput. Graph..

[35]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[36]  Xiong Chen,et al.  Learning Discriminative Features with Multiple Granularities for Person Re-Identification , 2018, ACM Multimedia.

[37]  Wen Gao,et al.  Attention Driven Person Re-identification , 2018, Pattern Recognit..

[38]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Cheng Wang,et al.  Mancs: A Multi-task Attentional Network with Curriculum Sampling for Person Re-Identification , 2018, ECCV.

[40]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[43]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[44]  M. Corbetta,et al.  Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.

[45]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[46]  Yi Yang,et al.  Pedestrian Alignment Network for Large-scale Person Re-Identification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[47]  Liang Zheng,et al.  Re-ranking Person Re-identification with k-Reciprocal Encoding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[49]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Runhao Zeng,et al.  Relation Attention for Temporal Action Localization , 2020, IEEE Transactions on Multimedia.

[51]  Yunchao Wei,et al.  STA: Spatial-Temporal Attention for Large-Scale Video-based Person Re-Identification , 2018, AAAI.

[52]  Longhui Wei,et al.  Person Transfer GAN to Bridge Domain Gap for Person Re-identification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Zhenan Sun,et al.  Recognizing Partial Biometric Patterns , 2018, ArXiv.

[54]  Xiaogang Wang,et al.  Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[56]  Stamatios Lefkimmiatis,et al.  Non-local Color Image Denoising with Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[58]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Ihsan Ullah,et al.  Survey on Deep Learning Techniques for Person Re-Identification Task , 2018, ArXiv.

[60]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Naila Murray,et al.  Re-ID done right: towards good practices for person re-identification , 2018, ArXiv.

[62]  Rui Yu,et al.  Deep-Person: Learning Discriminative Deep Features for Person Re-Identification , 2017, Pattern Recognit..

[63]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[65]  Yi Yang,et al.  Person Re-identification: Past, Present and Future , 2016, ArXiv.

[66]  Xiaogang Wang,et al.  Diversity Regularized Spatiotemporal Attention for Video-Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[67]  Jingdong Wang,et al.  Interleaved Group Convolutions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[68]  Liang Wang,et al.  Mask-Guided Contrastive Attention Model for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69]  Yan Wang,et al.  Resource Aware Person Re-identification Across Multiple Resolutions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[71]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.