论文信息 - Learning Bidirectional Temporal Cues for Video-Based Person Re-Identification

Learning Bidirectional Temporal Cues for Video-Based Person Re-Identification

This paper presents an end-to-end learning architecture for video-based person re-identification by integrating convolutional neural networks (CNNs) and bidirectional recurrent neural networks (BRNNs). Given a video with consecutive frames, features of each frame are extracted with CNN and then are fed into the BRNN to get a final spatio-temporal representation about the video. Specifically, CNN acts as a Spatial Feature Extractor, while BRNN is expected to capture the temporal cues of sequential frames in both forward and backward directions, simultaneously. The whole network is trained end-to-end with a joint identification and verification manner. Experimental results on benchmark data sets show that the proposed model can effectively learn spatio-temporal features relevant for re-identification and outperforms existing video-based person re-identification methods.

[1] Shishir K. Shah,et al. Part-based spatio-temporal model for multi-person re-identification , 2012, Pattern Recognit. Lett..

[2] Bingbing Ni,et al. Person Re-identification via Recurrent Feature Aggregation , 2016, ECCV.

[3] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[4] Horst Bischof,et al. Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6] Bingpeng Ma,et al. Video-Based Pedestrian Re-Identification by Adaptive Spatio-Temporal Appearance Model , 2017, IEEE Transactions on Image Processing.

[7] Bingpeng Ma,et al. BiCov: a novel image representation for person re-identification and face verification , 2012, BMVC.

[8] Bir Bhanu,et al. Person Reidentification With Reference Descriptor , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[9] Xiaogang Wang,et al. Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[10] David Zhang,et al. Joint Learning of Single-Image and Cross-Image Representations for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Ruimao Zhang,et al. Cost-Effective Active Learning for Deep Image Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[12] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13] Vittorio Murino,et al. Custom Pictorial Structures for Re-identification , 2011, BMVC.

[14] Shengcai Liao,et al. Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[17] Bir Bhanu,et al. Individual recognition using gait energy image , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Lei Zhang,et al. Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification , 2015, IEEE Transactions on Image Processing.

[19] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[20] Lei Zhang,et al. Dictionary Pair Classifier Driven Convolutional Neural Networks for Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Hai Tao,et al. Evaluating Appearance Models for Recognition, Reacquisition, and Tracking , 2007 .

[22] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[23] Qi Tian,et al. MARS: A Video Benchmark for Large-Scale Person Re-Identification , 2016, ECCV.

[24] Yang Li,et al. Sparse re-id: Block sparsity for person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25] Horst Bischof,et al. Relaxed Pairwise Learned Metric for Person Re-identification , 2012, ECCV.

[26] Chao Gao,et al. Person Reidentification Using Attribute-Restricted Projection Metric Learning , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[27] Xiaogang Wang,et al. Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Liang Lin,et al. Deep Joint Task Learning for Generic Object Extraction , 2014, NIPS.

[29] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[30] Xiaogang Wang,et al. DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Liang Lin,et al. Human Re-identification by Matching Compositional Template with Cluster Sampling , 2013, 2013 IEEE International Conference on Computer Vision.

[32] Sergio A. Velastin,et al. Local Fisher Discriminant Analysis for Pedestrian Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Alessandro Perina,et al. Person re-identification by symmetry-driven accumulation of local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34] Jesús Martínez del Rincón,et al. Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Michael C. Mozer,et al. A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[36] Stan Z. Li,et al. Deep Metric Learning for Practical Person Re-Identification , 2014, ArXiv.

[37] Bingpeng Ma,et al. A Spatio-Temporal Appearance Representation for Video-Based Pedestrian Re-Identification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38] Shaogang Gong,et al. Person Re-identification by Video Ranking , 2014, ECCV.

[39] Tao Xiang,et al. Gait Recognition by Ranking , 2012, ECCV.

[40] Tomer Hertz,et al. Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[41] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.

[42] Horst Bischof,et al. Person Re-identification by Descriptive and Discriminative Classification , 2011, SCIA.

[43] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[44] Cordelia Schmid,et al. A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[45] Liang Lin,et al. Deep feature learning with relative distance comparison for person re-identification , 2015, Pattern Recognit..

[46] Meng Wang,et al. A Deep Structured Model with Radius–Margin Bound for 3D Human Activity Recognition , 2015, International Journal of Computer Vision.

[47] Chunxiao Liu,et al. Person Re-identification: What Features Are Important? , 2012, ECCV Workshops.