Fine-Grained Spatial Alignment Model for Person Re-Identification With Focal Triplet Loss

Recent advances of person re-identification have well advocated the usage of human body cues to boost performance. However, most existing methods still retain on exploiting a relatively coarse-grained local information. Such information may include redundant backgrounds that are sensitive to the apparently similar persons when facing challenging scenarios like complex poses, inaccurate detection, occlusion and misalignment. In this paper we propose a novel Fine-Grained Spatial Alignment Model (FGSAM) to mine fine-grained local information to handle the aforementioned challenge effectively. In particular, we first design a pose resolve net with channel parse blocks (CPB) to extract pose information in pixel-level. This network allows the proposed model to be robust to complex pose variations while suppressing the redundant backgrounds caused by inaccurate detection and occlusion. Given the extracted pose information, a locally reinforced alignment mode is further proposed to address the misalignment problem between different local parts by considering different local parts along with attribute information in a fine-grained way. Finally, a focal triplet loss is designed to effectively train the entire model, which imposes a constraint on the intra-class and an adaptively weight adjustment mechanism to handle the hard sample problem. Extensive evaluations and analysis on Market1501, DukeMTMC-reid and PETA datasets demonstrate the effectiveness of FGSAM in coping with the problems of misalignment, occlusion and complex poses.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jianbing Shen,et al.  Local Semantic Siamese Networks for Fast Tracking , 2019, IEEE Transactions on Image Processing.

[3]  Yifan Sun,et al.  SVDNet for Pedestrian Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Haiqing Li,et al.  Deep Spatial Feature Reconstruction for Partial Person Re-identification: Alignment-free Approach , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Huchuan Lu,et al.  Pose-Invariant Embedding for Deep Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[7]  Haibin Ling,et al.  A Deep Network Solution for Attention and Aesthetics Aware Photo Cropping , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jianbing Shen,et al.  Triplet Loss in Siamese Network for Object Tracking , 2018, ECCV.

[9]  Q. Tian,et al.  GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval , 2017, ACM Multimedia.

[10]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[11]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jing Xu,et al.  Attention-Aware Compositional Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Mohamed Atri,et al.  Real-time stereo matching on CUDA using Fourier descriptors and dynamic programming , 2019, Computational Visual Media.

[15]  Jianbing Shen,et al.  Fast Online Tracking With Detection Refinement , 2018, IEEE Transactions on Intelligent Transportation Systems.

[16]  Jianhuang Lai,et al.  Learning View-Specific Deep Networks for Person Re-Identification , 2018, IEEE Transactions on Image Processing.

[17]  Yi Yang,et al.  Pedestrian Alignment Network for Large-scale Person Re-Identification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jian Sun,et al.  AlignedReID: Surpassing Human-Level Performance in Person Re-Identification , 2017, ArXiv.

[20]  Qinqin Zhou,et al.  LRDNN: Local-refining based Deep Neural Network for Person Re-Identification with Attribute Discerning , 2019, IJCAI.

[21]  Ling Shao,et al.  Robust Object Tracking Using Manifold Regularized Convolutional Neural Networks , 2019, IEEE Transactions on Multimedia.

[22]  Xiaogang Wang,et al.  Joint Detection and Identification Feature Learning for Person Search , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Muhittin Gokmen,et al.  Human Semantic Parsing for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Ruigang Yang,et al.  Inferring Salient Objects from Human Fixations , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[26]  Shengcai Liao,et al.  Salient Color Names for Person Re-identification , 2014, ECCV.

[27]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Ling Shao,et al.  Dynamical Hyperparameter Optimization via Deep Reinforcement Learning in Tracking , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[30]  Yi Yang,et al.  Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Xiaogang Wang,et al.  Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Wei Jiang,et al.  Bag of Tricks and a Strong Baseline for Deep Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Shiliang Zhang,et al.  Pose-Driven Deep Convolutional Model for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Tao Xiang,et al.  Deep Learning for Person Re-Identification: A Survey and Outlook , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Zhedong Zheng,et al.  Joint Discriminative and Generative Learning for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Cheng Wang,et al.  Mancs: A Multi-task Attentional Network with Curriculum Sampling for Person Re-Identification , 2018, ECCV.

[39]  Weihong Deng,et al.  Mixed High-Order Attention Network for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Xiaoou Tang,et al.  Pedestrian Attribute Recognition At Far Distance , 2014, ACM Multimedia.

[42]  Jun Li,et al.  Deep Alignment Network Based Multi-Person Tracking With Occlusion and Motion Reasoning , 2019, IEEE Transactions on Multimedia.

[43]  Liang Wang,et al.  Mask-Guided Contrastive Attention Model for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Xiaogang Jin,et al.  Quadruplet Network With One-Shot Learning for Fast Visual Object Tracking , 2017, IEEE Transactions on Image Processing.

[45]  Michael Jones,et al.  An improved deep learning architecture for person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Kim-Hui Yap,et al.  AANet: Attribute Attention Network for Person Re-Identifications , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Ming-Ming Cheng,et al.  Multi-Level Context Ultra-Aggregation for Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Shiliang Zhang,et al.  Deep Attributes Driven Multi-Camera Person Re-identification , 2016, ECCV.

[50]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[51]  Jingdong Wang,et al.  Deeply-Learned Part-Aligned Representations for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52]  Qi Tian,et al.  Beyond Part Models: Person Retrieval with Refined Part Pooling , 2017, ECCV.

[53]  Rainer Stiefelhagen,et al.  Person Re-identification by Deep Learning Attribute-Complementary Information , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[54]  Alex Bewley,et al.  Deep Cosine Metric Learning for Person Re-identification , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[55]  Shiguang Shan,et al.  Interaction-And-Aggregation Network for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Xiaogang Wang,et al.  Person Re-identification with Deep Similarity-Guided Graph Neural Network , 2018, ECCV.

[57]  M. Saquib Sarfraz,et al.  A Pose-Sensitive Embedding for Person Re-identification with Expanded Cross Neighborhood Re-ranking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Ling Shao,et al.  Visual Object Tracking by Hierarchical Attention Siamese Network , 2020, IEEE Transactions on Cybernetics.

[59]  Ling Shao,et al.  Multiobject Tracking by Submodular Optimization , 2019, IEEE Transactions on Cybernetics.

[60]  Alessandro Perina,et al.  Person re-identification by symmetry-driven accumulation of local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[61]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Liang Zheng,et al.  Improving Person Re-identification by Attribute and Identity Learning , 2017, Pattern Recognit..

[63]  Tao Mei,et al.  Part-Aligned Bilinear Representations for Person Re-identification , 2018, ECCV.