Transformers in Pedestrian Image Retrieval and Person Re-Identification in a Multi-Camera Surveillance System

Person Re-Identification is an essential task in computer vision, particularly in surveillance applications. The aim is to identify a person based on an input image from surveillance photographs in various scenarios. Most Person re-ID techniques utilize Convolutional Neural Networks (CNNs); however, Vision Transformers are replacing pure CNNs for various computer vision tasks such as object recognition, classification, etc. The vision transformers contain information about local regions of the image. The current techniques take this advantage to improve the accuracy of the tasks underhand. We propose to use the vision transformers in conjunction with vanilla CNN models to investigate the true strength of transformers in person re-identification. We employ three backbones with different combinations of vision transformers on two benchmark datasets. The overall performance of the backbones increased, showing the importance of vision transformers. We provide ablation studies and show the importance of various components of the vision transformers in re-identification tasks.

[1]  Jongin Lim,et al.  Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification , 2019, AAAI.

[2]  Charu Sharma,et al.  Person Re-Identification with a Locally Aware Transformer , 2021, ArXiv.

[3]  Xin Zhao,et al.  EANet: Enhancing Alignment for Cross-Domain Person Re-identification , 2018, ArXiv.

[4]  Yunpeng Gong A general multi-modal data learning method for Person Re-identification , 2021 .

[5]  Shiliang Zhang,et al.  Large-Scale Spatio-Temporal Person Re-Identification: Algorithms and Benchmark , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Avi Ostfeld,et al.  Protecting Water Infrastructure From Cyber and Physical Threats: Using Multimodal Data Fusion and Adaptive Deep Learning to Monitor Critical Systems , 2019, IEEE Signal Processing Magazine.

[7]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[8]  Wen Gao,et al.  Attention Driven Person Re-identification , 2018, Pattern Recognit..

[9]  Yi Yang,et al.  Person Re-identification: Past, Present and Future , 2016, ArXiv.

[10]  Shin'ichi Satoh,et al.  Illumination-Adaptive Person Re-identification , 2019, IEEE Transactions on Multimedia.

[11]  Ling Shao,et al.  Deep Learning for Person Re-Identification: A Survey and Outlook , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Rui Yu,et al.  Deep-Person: Learning Discriminative Deep Features for Person Re-Identification , 2017, Pattern Recognit..

[13]  Fatih Murat Porikli,et al.  CoinNet: Deep Ancient Roman Republican Coin Classification via Feature Fusion and Attention , 2019, Pattern Recognit..

[14]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[15]  Yu Wu,et al.  Auto-ReID: Searching for a Part-Aware ConvNet for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Barbara Rychalska,et al.  On the Unreasonable Effectiveness of Centroids in Image Retrieval , 2021, ICONIP.

[17]  Quan Zhou,et al.  FPB: Feature Pyramid Branch for Person Re-Identification , 2021, ArXiv.

[18]  Alexandre Alahi,et al.  Rethinking Person Re-Identification with Confidence , 2019, ArXiv.

[19]  Bernt Schiele,et al.  Parameter-Free Spatial Attention Network for Person Re-Identification , 2018, ArXiv.

[20]  Andrea Cavallaro,et al.  Omni-Scale Feature Learning for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Ajmal Mian,et al.  Deep localization of protein structures in fluorescence microscopy images , 2019, ArXiv.

[22]  Ping Tan,et al.  Batch Feature Erasing for Person Re-identification and Beyond , 2018, ArXiv.