TransReID: Transformer-based Object Re-Identification

Extracting robust feature representation is one of the key challenges in object re-identification (ReID). Although convolution neural network (CNN)-based methods have achieved great success, they only process one local neighborhood at a time and suffer from information loss on details caused by convolution and downsampling operators (e.g. pooling and strided convolution). To overcome these limitations, we propose a pure transformer-based object ReID framework named TransReID. Specifically, we first encode an image as a sequence of patches and build a transformer-based strong baseline with a few critical improvements, which achieves competitive results on several ReID benchmarks with CNN-based methods. To further enhance the robust feature learning in the context of transformers, two novel modules are carefully designed. (i) The jigsaw patch module (JPM) is proposed to rearrange the patch embeddings via shift and patch shuffle operations which generates robust features with improved discrimination ability and more diversified coverage. (ii) The side information embeddings (SIE) is introduced to mitigate feature bias towards camera/view variations by plugging in learnable embeddings to incorporate these non-visual clues. To the best of our knowledge, this is the first work to adopt a pure transformer for ReID research. Experimental results of TransReID are superior promising, which achieve state-of-the-art performance on both person and vehicle ReID benchmarks. Code is available at https://github.com/heshuting555/TransReID.

[1]  Shengjin Wang,et al.  Towards Discriminative Representation Learning for Unsupervised Person Re-identification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Rong Jin,et al.  Graph Convolution for Re-Ranking in Person Re-Identification , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Yongdong Zhang,et al.  Diverse Part Discovery: Occluded Person Re-identification with Part-Aware Transformer , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Pichao Wang,et al.  Trear: Transformer-Based RGB-D Egocentric Action Recognition , 2021, IEEE Transactions on Cognitive and Developmental Systems.

[5]  Fahad Shahbaz Khan,et al.  Transformers in Vision: A Survey , 2021, ACM Comput. Surv..

[6]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  P. Luo,et al.  TransTrack: Multiple-Object Tracking with Transformer , 2020, ArXiv.

[8]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[9]  Eric Tzeng,et al.  Toward Transformer-Based Object Detection , 2020, ArXiv.

[10]  Pichao Wang,et al.  Transformer guided geometry model for flow-based unsupervised visual odometry , 2020, Neural Computing and Applications.

[11]  Wen Gao,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[13]  Chenggang Yan,et al.  Beyond the Parts: Learning Multi-view Cross-part Correlation for Vehicle Re-identification , 2020, ACM Multimedia.

[14]  Pichao Wang,et al.  Exploiting Better Feature Aggregation for Video Object Detection , 2020, ACM Multimedia.

[15]  Yilong Yin,et al.  CFVMNet: A Multi-branch Network for Vehicle Re-identification Based on Common Field of View , 2020, ACM Multimedia.

[16]  Shao-Yi Chien,et al.  Orientation-aware Vehicle Re-identification with Semantics-guided Part Attention Network , 2020, ECCV.

[17]  Shuicheng Yan,et al.  ConvBERT: Improving BERT with Span-based Dynamic Convolution , 2020, NeurIPS.

[18]  Ming Tang,et al.  Identity-Guided Human Semantic Parsing for Person Re-Identification , 2020, ECCV.

[19]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[20]  Rongrong Ji,et al.  Salience-Guided Cascaded Suppression Network for Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jianhuang Lai,et al.  Smoothing Adversarial Domain Attack and P-Memory Reconsolidation for Cross-Domain Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[23]  Song Han,et al.  Lite Transformer with Long-Short Range Attention , 2020, ICLR.

[24]  Wei Jiang,et al.  Multi-Domain Learning and Identity Mining for Vehicle Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Chongruo Wu,et al.  ResNeSt: Split-Attention Networks , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Edouard Grave,et al.  Training with Quantization Noise for Extreme Model Compression , 2020, ICLR.

[27]  R. Chellappa,et al.  The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification , 2020, ECCV.

[28]  Qingming Huang,et al.  Parsing-Based View-Aware Embedding Network for Vehicle Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Gang Yu,et al.  High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yichen Wei,et al.  Circle Loss: A Unified Perspective of Pair Similarity Optimization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Retrieval , 2020, Definitions.

[32]  Calton Pu,et al.  Looking GLAMORous: Vehicle Re-Id in Heterogeneous Cameras Networks with Global and Local Attention , 2020, ArXiv.

[33]  Longhui Wei,et al.  Rethinking the Distribution Gap of Person Re-identification with Camera-Based Batch Normalization , 2020, ECCV.

[34]  Wenjun Zeng,et al.  Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification , 2020, AAAI.

[35]  Tao Xiang,et al.  Deep Learning for Person Re-Identification: A Survey and Outlook , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Wei Jiang,et al.  Stripe-based and attribute-aware network: a two-branch deep model for vehicle re-identification , 2019, ArXiv.

[37]  Yu Wu,et al.  Pose-Guided Feature Alignment for Occluded Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Yichen Wei,et al.  Vehicle Re-Identification With Viewpoint-Aware Metric Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Wei Jiang,et al.  AlignedReID++: Dynamically matching local information for person re-identification , 2019, Pattern Recognit..

[40]  Edouard Grave,et al.  Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.

[41]  Chunhua Shen,et al.  Part-Guided Attention Learning for Vehicle Re-Identification , 2019, arXiv.org.

[42]  Christian Poellabauer,et al.  Second-Order Non-Local Attention Networks for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Weihong Deng,et al.  Mixed High-Order Attention Network for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Yang Yang,et al.  ABD-Net: Attentive but Diverse Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Bing He,et al.  Part-Regularized Near-Duplicate Vehicle Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Xin Jin,et al.  Semantics-Aligned Representation Learning for Person Re-identification , 2019, AAAI.

[47]  Andrea Cavallaro,et al.  Omni-Scale Feature Learning for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Zhenan Sun,et al.  Foreground-Aware Pyramid Reconstruction for Alignment-Free Occluded Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Cuiling Lan,et al.  Relation-Aware Global Attention for Person Re-Identification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Wei Jiang,et al.  Bag of Tricks and a Strong Baseline for Deep Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[51]  Andrew Zisserman,et al.  Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Liang Zheng,et al.  Dissecting Person Re-Identification From the Viewpoint of Viewpoint , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Xiong Chen,et al.  Learning Discriminative Features with Multiple Granularities for Person Re-Identification , 2018, ACM Multimedia.

[54]  David Rolnick,et al.  How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.

[55]  Shaogang Gong,et al.  Harmonious Attention Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[57]  Haiqing Li,et al.  Deep Spatial Feature Reconstruction for Partial Person Re-identification: Alignment-free Approach , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Qi Tian,et al.  Beyond Part Models: Person Retrieval with Refined Part Pooling , 2017, ECCV.

[59]  Longhui Wei,et al.  Person Transfer GAN to Bridge Domain Gap for Person Re-identification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[62]  Kaiqi Huang,et al.  Learning Deep Context-Aware Features over Body and Latent Parts for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Q. Tian,et al.  GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval , 2017, ACM Multimedia.

[64]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[65]  Hantao Yao,et al.  Deep Representation Learning With Part Loss for Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[66]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[67]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[68]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[69]  Yi Yang,et al.  A Discriminatively Learned CNN Embedding for Person Reidentification , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[70]  Ramprasaath R. Selvaraju,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, International Journal of Computer Vision.

[71]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[72]  Wu Liu,et al.  Large-scale vehicle re-identification in urban surveillance videos , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[73]  Tiejun Huang,et al.  Deep Relative Distance Learning: Tell the Difference between Similar Vehicles , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Shuicheng Yan,et al.  End-to-End Comparative Attention Networks for Person Re-Identification , 2016, IEEE Transactions on Image Processing.

[75]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[76]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[78]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.