论文信息 - Deep spatio-temporal network for accurate person re-identification

Deep spatio-temporal network for accurate person re-identification

Feature extraction is one of two core tasks of a person re-identification besides metric learning. Building an effective feature extractor is the common goal of any research in the field. In this work, we propose a deep spatio-temporal network model which consists of a VGG-16 as a spatial feature extractor and a GRU network as an image sequence descriptor. Two temporal pooling techniques are investigated to produce compact yet discriminative sequence-level representation from a sequence of arbitrary length. To highlight the effectiveness of the final sequence-level feature set, we use a cosine distance metric learning to find an accurate probe-gallery pair. Experimental results on the ilIDS-VID and PRID 2011 dataset show that our method is slightly better on one dataset and significantly better on the other than state-of-the-art ones.

Cuong Vo Le | Trung Tran Quang | Dung Nguyen Tien | Quan Nguyen Hong | Nghia Nguyen Tuan

[1] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2] Yang Li,et al. Person Re-Identification with Discriminatively Trained Viewpoint Invariant Dictionaries , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3] Shaogang Gong,et al. Person Re-identification by Video Ranking , 2014, ECCV.

[4] Horst Bischof,et al. Person Re-identification by Descriptive and Discriminative Classification , 2011, SCIA.

[5] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[6] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7] Bingbing Ni,et al. Person Re-identification via Recurrent Feature Aggregation , 2016, ECCV.

[8] S. Sathiya Keerthi,et al. Efficient algorithms for ranking with SVMs , 2010, Information Retrieval.

[9] Horst Bischof,et al. Relaxed Pairwise Learned Metric for Person Re-identification , 2012, ECCV.

[10] Peter Vrancx,et al. An Empirical Comparison of Neural Architectures for Reinforcement Learning in Partially Observable Environments , 2015, ArXiv.

[11] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14] Matti Pietikäinen,et al. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16] Bingpeng Ma,et al. A Spatio-Temporal Appearance Representation for Video-Based Pedestrian Re-Identification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17] Shaogang Gong,et al. Person Re-Identification by Discriminative Selection in Video Ranking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[19] Jesús Martínez del Rincón,et al. Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21] Horst Bischof,et al. Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.