Image-to-video person re-identification using three-dimensional semantic appearance alignment and cross-modal interactive learning

Abstract Image-to-video person re-identification (I2V ReID), which aims to retrieve human targets between image-based queries and video-based galleries, has recently become a new research focus. However, the appearance misalignment and modality misalignment in both images and videos caused by pose variations, camera views, misdetections, and different data types, make I2V ReID still challenging. To this end, we propose a deep I2V ReID pipeline based on three-dimensional semantic appearance alignment (3D-SAA) and cross-modal interactive learning (CMIL) to address the aforementioned two challenges. Specifically, in the 3D-SAA module, the aligned local appearance images extracted by dense 3D human appearance estimation are in conjunction with global image and video embedding streams to learn more fine-grained identity features. The aligned local appearance images are further semantically aggregated by the proposed multi-branch aggregation network to weaken the negligible body parts. Moreover, to overcome the influence of modality misalignment, a CMIL module enables the communication between global image and video streams by interactively propagating the temporal information in videos to the channels of image feature maps. Extensive experiments on challenging MARS, DukeMTMC-VideoReID and iLIDS-VID datasets, show the superiority of our approach.

[1]  Wei Wu,et al.  Online Inter-Camera Trajectory Association Exploiting Person Re-Identification and Camera Topology , 2018, ACM Multimedia.

[2]  Zheng Wang,et al.  Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking , 2018, IJCAI.

[3]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Lijuan Sun,et al.  Salient Region-Based Least-Squares Log-Density Gradient Clustering for Image-To-Video Person Re-Identification , 2018, IEEE Access.

[5]  Marcus A. Magnor,et al.  Tex2Shape: Detailed Full Human Body Geometry From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Yu Wu,et al.  Exploit the Unknown Gradually: One-Shot Video-Based Person Re-identification by Stepwise Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Kaiqi Huang,et al.  Learning Deep Context-Aware Features over Body and Latent Parts for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Wei Shi,et al.  Identity-sensitive loss guided and instance feature boosted deep embedding for person search , 2020, Neurocomputing.

[11]  Hong Liu,et al.  Instance Enhancing Loss: Deep Identity-Sensitive Feature Embedding for Person Search , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[12]  Ning Xu,et al.  Cross-Media Body-Part Attention Network for Image-to-Video Person Re-Identification , 2019, IEEE Access.

[13]  Shaogang Gong,et al.  Person Re-identification by Video Ranking , 2014, ECCV.

[14]  Kai Niu,et al.  Textual Dependency Embedding for Person Search by Language , 2020, ACM Multimedia.

[15]  Zicheng Liu,et al.  Reinforced Temporal Attention and Split-Rate Transfer for Depth-Based Person Re-identification , 2017, ECCV.

[16]  Jingdong Wang,et al.  Deeply-Learned Part-Aligned Representations for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Zhenan Sun,et al.  DaNet: Decompose-and-aggregate Network for 3D Human Shape and Pose Estimation , 2019, ACM Multimedia.

[18]  Dongyu Zhang,et al.  Image-to-Video Person Re-Identification With Temporally Memorized Similarity Learning , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Jesús Martínez del Rincón,et al.  Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xiaogang Wang,et al.  HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Xiaogang Wang,et al.  Diversity Regularized Spatiotemporal Attention for Video-Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Yunchao Wei,et al.  Horizontal Pyramid Matching for Person Re-identification , 2018, AAAI.

[24]  Pong C. Yuen,et al.  Hierarchical Discriminative Learning for Visible Thermal Person Re-Identification , 2018, AAAI.

[25]  Edward J. Delp,et al.  A Two Stream Siamese Convolutional Neural Network for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Qi Tian,et al.  Beyond Part Models: Person Retrieval with Refined Part Pooling , 2017, ECCV.

[27]  Liang Wang,et al.  Adaptive super-resolution for person re-identification with low-resolution images , 2020, Pattern Recognit..

[28]  Xinyu Zhang,et al.  Unsupervised domain adaption for image-to-video person re-identification , 2020, Multim. Tools Appl..

[29]  Hong Liu,et al.  Online growing neural gas for anomaly detection in changing surveillance scenes , 2017, Pattern Recognit..

[30]  Wenjun Zeng,et al.  Densely Semantically Aligned Person Re-Identification , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Qi Tian,et al.  MARS: A Video Benchmark for Large-Scale Person Re-Identification , 2016, ECCV.

[32]  Zhenan Sun,et al.  Black Re-ID: A Head-shoulder Descriptor for the Challenging Problem of Person Re-Identification , 2020, ACM Multimedia.

[33]  Xilin Chen,et al.  Masked Graph Attention Network for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Hantao Yao,et al.  Deep Representation Learning With Part Loss for Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[35]  Hai Tao,et al.  Evaluating Appearance Models for Recognition, Reacquisition, and Tracking , 2007 .

[36]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Hong Liu,et al.  Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..

[38]  Tiejun Huang,et al.  Multi-scale 3D Convolution Network for Video Based Person Re-Identification , 2018, AAAI.

[39]  Jianhuang Lai,et al.  P2SNet: Can an Image Match a Video for Person Re-Identification in an End-to-End Way? , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[40]  Jinhyung Kim,et al.  READ: Reciprocal Attention Discriminator for Image-to-Video Re-identification , 2020, ECCV.

[41]  Xiaopeng Hong,et al.  Infrared-Visible Cross-Modal Person Re-Identification with an X Modality , 2020, AAAI.

[42]  Xiaogang Wang,et al.  Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Marcus A. Magnor,et al.  Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Shao-Yi Chien,et al.  Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification , 2019, BMVC.

[45]  Xiaogang Wang,et al.  Person Search with Natural Language Description , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Shiguang Shan,et al.  Image to Video Person Re-Identification by Learning Heterogeneous Dictionary Pair With Feature Projection Matrix , 2018, IEEE Transactions on Information Forensics and Security.

[47]  Simone Calderara,et al.  Robust Re-Identification by Multiple Views Knowledge Distillation , 2020, ECCV.

[48]  Shiguang Shan,et al.  Temporal Knowledge Propagation for Image-to-Video Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Houqiang Li,et al.  Spatial and Temporal Mutual Promotion for Video-based Person Re-identification , 2018, AAAI.

[50]  Wei-Shi Zheng,et al.  Deep asymmetric video-based person re-identification , 2019, Pattern Recognit..

[51]  Lin Li,et al.  Image-to-video person re-identification with cross-modal embeddings , 2020, Pattern Recognit. Lett..

[52]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.