Deep Cross-Modality Alignment for Multi-Shot Person Re-IDentification

Multi-shot person Re-IDentification (Re-ID) has recently received more research attention as its problem setting is more realistic compared to single-shot Re-ID in terms of application. While many large-scale single-shot Re-ID human image datasets have been released, most existing multishot Re-ID video sequence datasets containonly a few (i.e., several hundreds) human instances, which hinders further improvement of multi-shot Re-ID performance. To this end, we propose a deep cross-modality alignment network, which jointly explores both human sequence pairs and image pairs to facilitate training better multi-shot human Re-ID models, i.e., via transferring knowledge from image data to sequence data. To mitigate modality-to-modality mismatch issue, the proposed network is equipped with an image-to-sequence adaption module called cross-modality alignment sub-network, which successfully maps each human image into a pseudo human sequence to facilitate knowledge transferring and joint training. Extensive experimental results on several multi-shot person Re-ID benchmarks demonstrate great performance gain brought up by the proposed network.

[1]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[2]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[3]  Xiaogang Wang,et al.  Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[5]  Bingbing Ni,et al.  Person Re-identification via Recurrent Feature Aggregation , 2016, ECCV.

[6]  Horst Bischof,et al.  Mahalanobis Distance Learning for Person Re-identification , 2014, Person Re-Identification.

[7]  Lior Wolf,et al.  The Multiverse Loss for Robust Transfer Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bingpeng Ma,et al.  A Spatio-Temporal Appearance Representation for Video-Based Pedestrian Re-Identification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[10]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hai Tao,et al.  Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features , 2008, ECCV.

[12]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Abir Das,et al.  Consistent Re-identification in a Camera Network , 2014, ECCV.

[14]  Shaogang Gong,et al.  Person Re-identification by Video Ranking , 2014, ECCV.

[15]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[16]  Charles J. Geyer,et al.  Introduction to Markov Chain Monte Carlo , 2011 .

[17]  Tao Xiang,et al.  Transferring a semantic representation for person re-identification and search , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Alessandro Perina,et al.  Person re-identification by symmetry-driven accumulation of local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Lei Wang,et al.  Positive Semidefinite Metric Learning Using Boosting-like Algorithms , 2011, J. Mach. Learn. Res..

[20]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Horst Bischof,et al.  Person Re-identification by Descriptive and Discriminative Classification , 2011, SCIA.

[22]  Tomaso A. Poggio,et al.  Full-body person recognition system , 2003, Pattern Recognit..

[23]  Jesús Martínez del Rincón,et al.  Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xiang Li,et al.  Top-Push Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Amit K. Roy-Chowdhury,et al.  Re-Identification in the Function Space of Feature Warps , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Qi Tian,et al.  MARS: A Video Benchmark for Large-Scale Person Re-Identification , 2016, ECCV.

[29]  Shaogang Gong,et al.  Reidentification by Relative Distance Comparison , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Nicolai Schipper Jespersen,et al.  An Introduction to Markov Chain Monte Carlo , 2010 .

[32]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[33]  Louahdi Khoudour,et al.  Video Sequences Association for People Re-identification across Multiple Non-overlapping Cameras , 2009, ICIAP.

[34]  Horst Bischof,et al.  Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Michel F. Valstar,et al.  Learning to Transfer: Transferring Latent Task Structures and Its Application to Person-Specific Facial Action Unit Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).