Deep Spatial-Temporal Fusion Network for Video-Based Person Re-identification

In this paper, we propose a novel deep end-to-end network to automatically learn the spatial-temporal fusion features for video-based person re-identification. Specifically, the proposed network consists of CNN and RNN to jointly learn both the spatial and the temporal features of input image sequences. The network is optimized by utilizing the siamese and softmax losses simultaneously to pull the instances of the same person closer and push the instances of different persons apart. Our network is trained on full-body and part-body image sequences respectively to learn complementary representations from holistic and local perspectives. By combining them together, we obtain more discriminative features that are beneficial to person re-identification. Experiments conducted on the PRID-2011, i-LIDS-VIS and MARS datasets show that the proposed method performs favorably against existing approaches.

[1]  Xiang Li,et al.  Top-Push Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Xiaogang Wang,et al.  Person Re-identification by Salience Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Lin Wu,et al.  Deep Recurrent Convolutional Networks for Video-based Person Re-identification: An End-to-End Approach , 2016, ArXiv.

[5]  Sergio A. Velastin,et al.  Re-identification of Pedestrians in Crowds Using Dynamic Time Warping , 2012, ECCV Workshops.

[6]  Yang Li,et al.  Multi-Shot Human Re-Identification Using Adaptive Fisher Discriminant Analysis , 2015, BMVC.

[7]  Niall McLaughlin,et al.  Data-augmentation for reducing dataset bias in person re-identification , 2015, 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[8]  Shengcai Liao,et al.  Embedding Deep Metric for Person Re-identification: A Study Against Large Variations , 2016, ECCV.

[9]  Jesús Martínez del Rincón,et al.  Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Xiaogang Wang,et al.  Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ehud Rivlin,et al.  Color Invariants for Person Reidentification , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Qi Tian,et al.  MARS: A Video Benchmark for Large-Scale Person Re-Identification , 2016, ECCV.

[13]  Yang Li,et al.  Sparse re-id: Block sparsity for person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Shiliang Zhang,et al.  Deep Attributes Driven Multi-Camera Person Re-identification , 2016, ECCV.

[15]  Tieniu Tan,et al.  Human identification based on gait , 2005, The Kluwer international series on biometrics.

[16]  Xiaogang Wang,et al.  Unsupervised Salience Learning for Person Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Bir Bhanu,et al.  Individual recognition using gait energy image , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Shaogang Gong,et al.  Associating Groups of People , 2009, BMVC.

[20]  Bingbing Ni,et al.  Person Re-identification via Recurrent Feature Aggregation , 2016, ECCV.

[21]  Liang Lin,et al.  Deep feature learning with relative distance comparison for person re-identification , 2015, Pattern Recognit..

[22]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Tao Xiang,et al.  Gait Recognition by Ranking , 2012, ECCV.

[24]  Li Fei-Fei,et al.  Recurrent Attention Models for Depth-Based Person Identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Horst Bischof,et al.  Person Re-identification by Descriptive and Discriminative Classification , 2011, SCIA.

[26]  Sudeep Sarkar,et al.  The humanID gait challenge problem: data sets, performance, and analysis , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Bingpeng Ma,et al.  A Spatio-Temporal Appearance Representation for Video-Based Pedestrian Re-Identification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Shaogang Gong,et al.  Person Re-identification by Video Ranking , 2014, ECCV.

[29]  Michael Jones,et al.  An improved deep learning architecture for person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).