Image-to-Video Person Re-Identification With Temporally Memorized Similarity Learning

With the development of video surveillance in public safety field, there is an increasing research on person re-identification (re-id). In this paper, we address the image-to-video person re-id, in which the probe is an image and the gallery is consists of videos captured by nonoverlapping cameras. Compared with image, video sequence contains more temporal information that can be explored to improve the performance of re-identification system. However, it is challenging to model temporal information in the matching process of image-to-video person re-id. In this paper, we proposed a novel temporally memorized similarity learning neural network for this problem. In specific, the proposed network mainly consisted of two parts, including feature representation sub-network and similarity sub-network. In the first part, we adopted a convolutional neural network (CNN) to extract features from the input image. Given a video sequence of a person, features were first extracted from each its frame by using CNN and further forward to a long shot term memory (LSTM) network to encode the temporal information of video sequence. The outputs of LSTM were concatenated together as the feature vector of video sequences. Finally, the feature vectors of probe image and the video sequence were further forward to the similarity sub-network for distance metric learning. In the proposed framework, the feature representation and the similarity metric learning can be learned and optimized simultaneously. We evaluated the proposed framework on three public person re-id data sets, and the experimental results showed that the proposed approach is effective for the image-to-video person re-id.

[1]  Jesús Martínez del Rincón,et al.  Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bingbing Ni,et al.  Person Re-identification via Recurrent Feature Aggregation , 2016, ECCV.

[3]  Sergio A. Velastin,et al.  Local Fisher Discriminant Analysis for Pedestrian Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Alessandro Perina,et al.  Person re-identification by symmetry-driven accumulation of local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Jian-Huang Lai,et al.  Deep Ranking for Person Re-Identification via Joint Representation Learning , 2015, IEEE Transactions on Image Processing.

[7]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Lei Zhang,et al.  Cross-Domain Visual Matching via Generalized Similarity Measure and Feature Learning , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Baowen Xu,et al.  Super-resolution Person re-identification with semi-coupled low-rank discriminant dictionary learning , 2015, CVPR.

[11]  Hui Cheng,et al.  Detection-Free Multiobject Tracking by Reconfigurable Inference With Bundle Representations , 2016, IEEE Transactions on Cybernetics.

[12]  Xiaodan Liang,et al.  Human Parsing with Contextualized Convolutional Neural Network. , 2017, IEEE transactions on pattern analysis and machine intelligence.

[13]  Horst Bischof,et al.  Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Gang Wang,et al.  Tracklet Association with Online Target-Specific Metric Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Bir Bhanu,et al.  Individual recognition using gait energy image , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Bingpeng Ma,et al.  Video-Based Pedestrian Re-Identification by Adaptive Spatio-Temporal Appearance Model , 2017, IEEE Transactions on Image Processing.

[17]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[18]  Peng Wang,et al.  Temporal Pyramid Pooling-Based Convolutional Neural Network for Action Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Xiaogang Wang,et al.  Intelligent multi-camera video surveillance: A review , 2013, Pattern Recognit. Lett..

[20]  Alan L. Yuille,et al.  Semantic part segmentation using compositional model combining shape and appearance , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Zhen Li,et al.  Hierarchical Gaussianization for image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Meng Wang,et al.  A Deep Structured Model with Radius–Margin Bound for 3D Human Activity Recognition , 2015, International Journal of Computer Vision.

[25]  Frédéric Jurie,et al.  PCCA: A new approach for distance learning from sparse pairwise constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Horst Bischof,et al.  Person Re-identification by Descriptive and Discriminative Classification , 2011, SCIA.

[27]  Michael Jones,et al.  An improved deep learning architecture for person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Gang Wang,et al.  Gated Siamese Convolutional Neural Network Architecture for Human Re-identification , 2016, ECCV.

[29]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[30]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[31]  David Zhang,et al.  Joint distance and similarity measure learning based on triplet-based constraints , 2017, Inf. Sci..

[32]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Liang Lin,et al.  Deep feature learning with relative distance comparison for person re-identification , 2015, Pattern Recognit..

[34]  Shaogang Gong,et al.  Person Re-identification by Video Ranking , 2014, ECCV.

[35]  Jian-Huang Lai,et al.  Discriminatively Trained And-Or Graph Models for Object Shape Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Qi Tian,et al.  MARS: A Video Benchmark for Large-Scale Person Re-Identification , 2016, ECCV.

[37]  M. Maqbool,et al.  GMMCP Tracker : Globally Optimal Generalized Maximum Multi Clique Problem for Multiple Object Tracking , 2022 .

[38]  Richard I. Hartley,et al.  Person Reidentification Using Spatiotemporal Appearance , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  David Zhang,et al.  Learning Iteration-wise Generalized Shrinkage–Thresholding Operators for Blind Deconvolution , 2016, IEEE Transactions on Image Processing.

[40]  Tiziana D'Orazio,et al.  People re-identification and tracking from multiple cameras: A review , 2012, 2012 19th IEEE International Conference on Image Processing.

[41]  Xiao Liu,et al.  Semi-supervised Coupled Dictionary Learning for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Riccardo Satta,et al.  Appearance Descriptors for Person Re-identification: a Comprehensive Review , 2013, ArXiv.

[44]  Zhen Li,et al.  Learning Locally-Adaptive Decision Functions for Person Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Sergio A. Velastin,et al.  Re-identification of Pedestrians in Crowds Using Dynamic Time Warping , 2012, ECCV Workshops.

[46]  Xiaogang Wang,et al.  Unsupervised Salience Learning for Person Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Xiang Li,et al.  Top-Push Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Shiliang Zhang,et al.  Deep Attributes Driven Multi-Camera Person Re-identification , 2016, ECCV.

[50]  Shengcai Liao,et al.  Salient Color Names for Person Re-identification , 2014, ECCV.

[51]  Shengcai Liao,et al.  Efficient PSD Constrained Asymmetric Metric Learning for Person Re-Identification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[53]  David Zhang,et al.  Joint Learning of Single-Image and Cross-Image Representations for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Lei Zhang,et al.  Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification , 2015, IEEE Transactions on Image Processing.

[55]  Shengcai Liao,et al.  Deep Metric Learning for Person Re-identification , 2014, 2014 22nd International Conference on Pattern Recognition.

[56]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Hai Tao,et al.  Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features , 2008, ECCV.

[58]  Shuicheng Yan,et al.  End-to-End Comparative Attention Networks for Person Re-Identification , 2016, IEEE Transactions on Image Processing.