A Flow-Guided Mutual Attention Network for Video-Based Person Re-Identification

Person Re-Identification (ReID) is a challenging problem in many video analytics and surveillance applications, where a person's identity must be associated across a distributed non-overlapping network of cameras. Video-based person ReID has recently gained much interest because it allows capturing discriminant spatio-temporal information from video clips that is unavailable for image-based ReID. Despite recent advances, deep learning (DL) models for video ReID often fail to leverage this information to improve the robustness of feature representations. In this paper, the motion pattern of a person is explored as an additional cue for ReID. In particular, a flow-guided Mutual Attention network is proposed for fusion of image and optical flow sequences using any 2D-CNN backbone, allowing to encode temporal information along with spatial appearance information. Our Mutual Attention network relies on the joint spatial attention between image and optical flow features maps to activate a common set of salient features across them. In addition to flow-guided attention, we introduce a method to aggregate features from longer input streams for better video sequence-level representation. Our extensive experiments on three challenging video ReID datasets indicate that using the proposed Mutual Attention network allows to improve recognition accuracy considerably with respect to conventional gated-attention networks, and state-of-the-art methods for video-based person ReID.

[1]  Yu Liu,et al.  Region-based Quality Estimation Network for Large-scale Person Re-identification , 2017, AAAI.

[2]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[4]  Ismail Ben Ayed,et al.  Pose Guided Gated Fusion for Person Re-identification , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Alessandro Perina,et al.  Person Re-identification by Discriminatively Selecting Parts and Features , 2014, ECCV Workshops.

[6]  Shiliang Zhang,et al.  Pose-Driven Deep Convolutional Model for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Ghassan Al-Regib,et al.  TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition , 2017, Signal Process. Image Commun..

[8]  Alessandro Perina,et al.  Person re-identification by symmetry-driven accumulation of local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Shiguang Shan,et al.  VRSTC: Occlusion-Free Video Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Rama Chellappa,et al.  Advances in Video-Based Human Activity Analysis: Challenges and Approaches , 2010, Adv. Comput..

[11]  Yu Wu,et al.  Exploit the Unknown Gradually: One-Shot Video-Based Person Re-identification by Stepwise Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Kim-Hui Yap,et al.  AANet: Attribute Attention Network for Person Re-Identifications , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Huchuan Lu,et al.  Pose-Invariant Embedding for Deep Person Re-Identification , 2017, IEEE Transactions on Image Processing.

[14]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[15]  Hassan Foroosh,et al.  Spatio-Temporal Fusion Networks for Action Recognition , 2018, ACCV.

[16]  Qi Tian,et al.  Beyond Part Models: Person Retrieval with Refined Part Pooling , 2017, ECCV.

[17]  Ramakant Nevatia,et al.  Revisiting Temporal Modeling for Video-based Person ReID , 2018, ArXiv.

[18]  Liang Wang,et al.  Mask-Guided Contrastive Attention Model for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Michael Jones,et al.  An improved deep learning architecture for person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Shuicheng Yan,et al.  End-to-End Comparative Attention Networks for Person Re-Identification , 2016, IEEE Transactions on Image Processing.

[21]  Zhen Zhou,et al.  See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-Based Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Gang Wang,et al.  Gated Siamese Convolutional Neural Network Architecture for Human Re-identification , 2016, ECCV.

[23]  Xiaogang Wang,et al.  Spindle Net: Person Re-identification with Human Body Region Guided Feature Decomposition and Fusion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xiaogang Wang,et al.  Video Person Re-identification with Competitive Snippet-Similarity Aggregation and Co-attentive Snippet Embedding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Shaogang Gong,et al.  Person Re-identification by Video Ranking , 2014, ECCV.

[26]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[27]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Tao Mei,et al.  Part-Aligned Bilinear Representations for Person Re-identification , 2018, ECCV.

[29]  Gang Wang,et al.  Dual Attention Matching Network for Context-Aware Feature Sequence Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[31]  Alessandro Perina,et al.  Exploiting Multiple Detections for Person Re-Identification , 2018, J. Imaging.

[32]  Yu Liu,et al.  Quality Aware Network for Set to Set Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Edward J. Delp,et al.  A Two Stream Siamese Convolutional Neural Network for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Yunchao Wei,et al.  STA: Spatial-Temporal Attention for Large-Scale Video-based Person Re-Identification , 2018, AAAI.

[35]  Xiaoou Tang,et al.  LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Kaiqi Huang,et al.  Learning Deep Context-Aware Features over Body and Latent Parts for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jiwen Lu,et al.  Spatial-Temporal Attention-Aware Learning for Video-Based Person Re-Identification , 2019, IEEE Transactions on Image Processing.

[39]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Shao-Yi Chien,et al.  Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification , 2019, BMVC.

[41]  Anil K. Jain,et al.  On-line signature verification, , 2002, Pattern Recognit..

[42]  Yu Wu,et al.  Auto-ReID: Searching for a Part-Aware ConvNet for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[44]  Shengcai Liao,et al.  Deep Metric Learning for Person Re-identification , 2014, 2014 22nd International Conference on Pattern Recognition.