An Implicit Attention Mechanism for Deep Learning Pedestrian Re-identification Frameworks

Attention is defined as the preparedness for the mental selection of certain aspects in a physical environment. In the computer vision domain, this mechanism is of most interest, as it helps to define the segments of an image/video that are critical for obtaining a specific decision. This paper introduces one 'implicit' attentional mechanism for deep learning frameworks, that provides simultaneously: 1) masks-free; and 2) foreground-focused samples for the inference phase. The main idea is to generate synthetic data composed of interleaved segments from the original learning set, while using class information only from specific segments. During the learning phase, the newly generated samples feed the network, keeping their label exclusively consistent with the identity from where the region-of-interest was cropped. Hence, as the model receives images of each identity with inconsistent unwanted areas, it naturally pays the most attention to the label consistent consistent regions, which we observed to be equivalent to learn an effective receptive field. During the test phase, samples are provided without any mask, and the network naturally disregards the detrimental information, which is the insight for the observed improvements in performance. As a proof-of-concept, we consider the challenging problem of pedestrian re-identification and compare the effectiveness of our solution to the state-of-the-art techniques in the well known Richly Annotated Pedestrian (RAP) dataset. The code is available at this https URL

[1]  Yi Yang,et al.  A Discriminatively Learned CNN Embedding for Person Reidentification , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[2]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[3]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[4]  Xiaogang Wang,et al.  HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Bodo Rosenhahn,et al.  Triplet-Based Deep Similarity Learning for Person Re-Identification , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[6]  Xiaogang Wang,et al.  Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Gang Wang,et al.  Gated Siamese Convolutional Neural Network Architecture for Human Re-identification , 2016, ECCV.

[9]  Muhittin Gokmen,et al.  Human Semantic Parsing for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Hiroshi Inoue,et al.  Data Augmentation by Pairing Samples for Images Classification , 2018, ArXiv.

[11]  Luc Van Gool,et al.  Pose Guided Person Image Generation , 2017, NIPS.

[12]  Wei Jiang,et al.  SphereReID: Deep Hypersphere Manifold Embedding for Person Re-Identification , 2018, J. Vis. Commun. Image Represent..

[13]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[16]  Shuicheng Yan,et al.  End-to-End Comparative Attention Networks for Person Re-Identification , 2016, IEEE Transactions on Image Processing.

[17]  Wei Jiang,et al.  Bag of Tricks and a Strong Baseline for Deep Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  Shaogang Gong,et al.  Harmonious Attention Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Jesús Martínez del Rincón,et al.  Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kaiqi Huang,et al.  A Richly Annotated Pedestrian Dataset for Person Retrieval in Real Surveillance Scenarios , 2019, IEEE Transactions on Image Processing.

[21]  Qi Tian,et al.  Beyond Part Models: Person Retrieval with Refined Part Pooling , 2017, ECCV.

[22]  Zhedong Zheng,et al.  CamStyle: A Novel Data Augmentation Method for Person Re-Identification , 2019, IEEE Transactions on Image Processing.

[23]  Yi Yang,et al.  Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Misha Denil,et al.  Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.

[25]  Alexandru Telea,et al.  An Image Inpainting Technique Based on the Fast Marching Method , 2004, J. Graphics, GPU, & Game Tools.