Short range correlation transformer for occluded person re-identification

Occluded person re-identification is one of the challenging areas of computer vision, which faces problems such as inefficient feature representation and low recognition accuracy. Convolutional neural network pays more attention to the extraction of local features, therefore it is difficult to extract features of occluded pedestrians and the effect is not so satisfied. Recently, vision transformer is introduced into the field of re-identification and achieves the most advanced results by constructing the relationship of global features between patch sequences. However, the performance of vision transformer in extracting local features is inferior to that of convolutional neural network. Therefore, we design a partial feature transformer-based person re-identification framework named PFT. The proposed PFT utilizes three modules to enhance the efficiency of vision transformer. (1) Patch full dimension enhancement module. We design a learnable tensor with the same size as patch sequences, which is full-dimensional and deeply embedded in patch sequences to enrich the diversity of training samples. (2) Fusion and reconstruction module. We extract the less important part of obtained patch sequences, and fuse them with original patch sequence to reconstruct the original patch sequences. (3) Spatial Slicing Module. We slice and group patch sequences from spatial direction, which can effectively improve the short-range correlation of patch sequences. Experimental results over occluded and holistic re-identification datasets demonstrate that the proposed PFT network achieves superior performance consistently and outperforms the state-of-the-art methods.

[1]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[3]  Shengcai Liao,et al.  Efficient PSD Constrained Asymmetric Metric Learning for Person Re-Identification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Zhenan Sun,et al.  Foreground-Aware Pyramid Reconstruction for Alignment-Free Occluded Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Shengcai Liao,et al.  Partial Face Recognition: Alignment-Free Approach , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[7]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[8]  Shengjing Tian,et al.  MHSA-Net: Multi-Head Self-Attention Network for Occluded Person Re-Identification , 2020, ArXiv.

[9]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Cuiling Lan,et al.  Relation-Aware Global Attention for Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Rongrong Ji,et al.  Salience-Guided Cascaded Suppression Network for Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Haiqing Li,et al.  Deep Spatial Feature Reconstruction for Partial Person Re-identification: Alignment-free Approach , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Mang Ye,et al.  Learning Sparse and Identity-Preserved Hidden Attributes for Person Re-Identification , 2020, IEEE Transactions on Image Processing.

[14]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[15]  Xiang Li,et al.  Partial Person Re-Identification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[17]  Ming Tang,et al.  Identity-Guided Human Semantic Parsing for Person Re-Identification , 2020, ECCV.

[18]  Muhittin Gokmen,et al.  Human Semantic Parsing for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Pichao Wang,et al.  TransReID: Transformer-based Object Re-Identification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Shijian Lu,et al.  Learning Disentangled Representation Implicitly Via Transformer for Occluded Person Re-Identification , 2021, IEEE Transactions on Multimedia.

[21]  Gang Yu,et al.  High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Shijian Lu,et al.  Matching on Sets: Conquer Occluded Person Re-identification Without Alignment , 2021, AAAI.

[23]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[24]  Shang Gao,et al.  Pose-Guided Visible Part Matching for Occluded Person ReID , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Christian Poellabauer,et al.  Second-Order Non-Local Attention Networks for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Yu Wu,et al.  Pose-Guided Feature Alignment for Occluded Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[29]  Jian Sun,et al.  Perceive Where to Focus: Learning Visibility-Aware Part-Level Features for Partial Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian-Huang Lai,et al.  Occluded Person Re-Identification , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32]  Zhenan Sun,et al.  Recognizing Partial Biometric Patterns , 2018, ArXiv.

[33]  Tao Mei,et al.  Part-Aligned Bilinear Representations for Person Re-identification , 2018, ECCV.

[34]  Jianhuang Lai,et al.  Smoothing Adversarial Domain Attack and P-Memory Reconsolidation for Cross-Domain Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Kaiqi Huang,et al.  Adversarially Occluded Samples for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Shaogang Gong,et al.  Harmonious Attention Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Omer Levy,et al.  Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[38]  Xiaogang Wang,et al.  FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification , 2018, NeurIPS.

[39]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[40]  Heiga Zen,et al.  Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.

[41]  Qi Tian,et al.  Beyond Part Models: Person Retrieval with Refined Part Pooling , 2017, ECCV.