Salient-to-Broad Transition for Video Person Re-identification

Due to the limited utilization of temporal relations in video re-id, the frame-level attention regions of mainstream methods are partial and highly similar. To address this problem, we propose a Salient-to-Broad Module (SBM) to enlarge the attention regions gradually. Specifically, in SBM, while the previous frames have focused on the most salient regions, the later frames tend to focus on broader regions. In this way, the additional information in broad regions can supplement salient regions, incurring more powerful video-level representations. To further improve SBM, an Integration-and-Distribution Module (IDM) is introduced to enhance frame-level representations. IDM first integrates features from the entire feature space and then distributes the integrated features to each spatial location. SBM and IDM are mutually beneficial since they enhance the representations from video-level and frame-level, respectively. Extensive experiments on four prevalent benchmarks demonstrate the effectiveness and superiority of our method. The source code is available at https://github.com/baist/SINet.

[1]  Bumsub Ham,et al.  Video-based Person Re-identification with Spatial and Temporal Memory Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Amit K. Roy-Chowdhury,et al.  Spatio-Temporal Representation Factorization for Video-based Person Re-Identification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Shiguang Shan,et al.  BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video Person Re-Identification , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Zhibo Chen,et al.  Dense Interaction Learning for Video-based Person Re-identification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Huchuan Lu,et al.  Watching You: Global-guided Reciprocal Learning for Video-based Person Re-identification , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xilin Chen,et al.  Temporal Complementary Learning for Video Person Re-Identification , 2020, ECCV.

[7]  Xilin Chen,et al.  Appearance-Preserving 3D Convolution for Video-based Person Re-identification , 2020, ECCV.

[8]  Ling Shao,et al.  Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Meng Yang,et al.  Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Wei-Shi Zheng,et al.  Spatial-Temporal Graph Convolutional Network for Video-Based Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[12]  Wenjun Zeng,et al.  Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Anurag Mittal,et al.  Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Chang Liu,et al.  DANet: Divergent Activation for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Q. Tian,et al.  Global-Local Temporal Representations for Video Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Jiashi Feng,et al.  Dynamic Kernel Distillation for Efficient Pose Estimation in Videos , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Shao-Yi Chien,et al.  Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification , 2019, BMVC.

[18]  Hongtao Lu,et al.  Attribute-Driven Feature Disentangling and Temporal Aggregation for Video Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Shiguang Shan,et al.  VRSTC: Occlusion-Free Video Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[21]  Cuiling Lan,et al.  Relation-Aware Global Attention for Person Re-Identification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Tiejun Huang,et al.  Multi-scale 3D Convolution Network for Video Based Person Re-Identification , 2018, AAAI.

[23]  Yunchao Wei,et al.  STA: Spatial-Temporal Attention for Large-Scale Video-based Person Re-Identification , 2018, AAAI.

[24]  Rongrong Ji,et al.  Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xiaodong Yu,et al.  Learning Bidirectional Temporal Cues for Video-Based Person Re-Identification , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Yue Gao,et al.  Hypergraph Neural Networks , 2018, AAAI.

[27]  R. Devon Hjelm,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[28]  Xiaogang Wang,et al.  Video Person Re-identification with Competitive Snippet-Similarity Aggregation and Co-attentive Snippet Embedding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Yu Wu,et al.  Exploit the Unknown Gradually: One-Shot Video-Based Person Re-identification by Stepwise Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Xiaogang Wang,et al.  Diversity Regularized Spatiotemporal Attention for Video-Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Yu Liu,et al.  Region-based Quality Estimation Network for Large-scale Person Re-identification , 2017, AAAI.

[32]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  T. Tan,et al.  See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-Based Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[36]  Shuicheng Yan,et al.  Video-Based Person Re-Identification With Accumulative Motion Context , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Qi Tian,et al.  MARS: A Video Benchmark for Large-Scale Person Re-Identification , 2016, ECCV.

[38]  Ramprasaath R. Selvaraju,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, International Journal of Computer Vision.

[39]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[40]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[41]  Jesús Martínez del Rincón,et al.  Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Christian Szegedy,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Shaogang Gong,et al.  Person Re-identification by Video Ranking , 2014, ECCV.

[46]  Horst Bischof,et al.  Person Re-identification by Descriptive and Discriminative Classification , 2011, SCIA.

[47]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Jie Zhou,et al.  Temporal Coherence or Temporal Motion: Which Is More Critical for Video-Based Person Re-identification? , 2020, ECCV.

[49]  Siu Cheung Hui,et al.  Compositional De-Attention Networks , 2019, NeurIPS.