Few-Shot Deep Adversarial Learning for Video-Based Person Re-Identification

Video-based person re-identification (re-ID) refers to matching people across camera views from arbitrary unaligned video footages. Existing methods rely on supervision signals to optimise a projected space under which the distances between inter/intra-videos are maximised/minimised. However, this demands exhaustively labelling people across camera views, rendering them unable to be scaled in large networked cameras. Also, it is noticed that learning effective video representations with view invariance is not explicitly addressed for which features exhibit different distributions otherwise. Thus, matching videos for person re-ID demands flexible models to capture the dynamics in time-series observations and learn view-invariant representations with access to limited labeled training samples. In this paper, we propose a novel few-shot deep learning approach to video-based person re-ID, to learn comparable representations that are discriminative and view-invariant. The proposed method is developed on the variational recurrent neural networks (VRNNs) and trained adversarially to produce latent variables with temporal dependencies that are highly discriminative yet view-invariant in matching persons. Through extensive experiments conducted on three benchmark datasets, we empirically show the capability of our method in creating view-invariant temporal features and state-of-the-art performance achieved by our method.

[1]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[2]  Fei Xiong,et al.  Person Re-Identification Using Kernel-Based Metric Learning Methods , 2014, ECCV.

[3]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[4]  Bingbing Ni,et al.  Person Re-identification via Recurrent Feature Aggregation , 2016, ECCV.

[5]  Lin Wu,et al.  Deep Linear Discriminant Analysis on Fisher Networks: A Hybrid Architecture for Person Re-identification , 2016, Pattern Recognit..

[6]  Ming Yang,et al.  Web-scale training for face identification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Lin Wu,et al.  Deep adaptive feature embedding with local sample distributions for person re-identification , 2017, Pattern Recognit..

[9]  Dacheng Tao,et al.  Person Re-Identification Over Camera Networks Using Multi-Task Distance Metric Learning , 2014, IEEE Transactions on Image Processing.

[10]  Jiwen Lu,et al.  Learning Invariant Color Features for Person Reidentification , 2014, IEEE Transactions on Image Processing.

[11]  Michael Jones,et al.  An improved deep learning architecture for person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[13]  Yu Wu,et al.  Exploit the Unknown Gradually: One-Shot Video-Based Person Re-identification by Stepwise Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Marc'Aurelio Ranzato,et al.  Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.

[15]  Shaogang Gong,et al.  Person Re-Identification by Support Vector Ranking , 2010, BMVC.

[16]  Xiaogang Wang,et al.  Learning Mid-level Filters for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Liang Lin,et al.  Human Re-identification by Matching Compositional Template with Cluster Sampling , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[19]  Jianxin Wu,et al.  Person Re-Identification with Correspondence Structure Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Sergio A. Velastin,et al.  Local Fisher Discriminant Analysis for Pedestrian Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Alessandro Perina,et al.  Person re-identification by symmetry-driven accumulation of local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Xue Li,et al.  Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition , 2019, IEEE Transactions on Cybernetics.

[23]  Ling Shao,et al.  Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[24]  Lin Wu,et al.  Robust Subspace Clustering for Multi-View Data by Exploiting Correlation Consensus , 2015, IEEE Transactions on Image Processing.

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Horst Bischof,et al.  Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Xiaogang Wang,et al.  Unsupervised Salience Learning for Person Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Lin Wu,et al.  What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification , 2017, Pattern Recognit..

[29]  Meng Wang,et al.  3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Xiaogang Wang,et al.  Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-Temporal Path Proposals , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Jesús Martínez del Rincón,et al.  Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Lin Wu,et al.  PersonNet: Person Re-identification with Deep Convolutional Neural Networks , 2016, ArXiv.

[33]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[34]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[35]  Shishir K. Shah,et al.  Part-based spatio-temporal model for multi-person re-identification , 2012, Pattern Recognit. Lett..

[36]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[37]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Lin Wu,et al.  Crossing Generative Adversarial Networks for Cross-View Person Re-identification , 2018, Neurocomputing.

[39]  Bir Bhanu,et al.  Person Re-Identification by Robust Canonical Correlation Analysis , 2015, IEEE Signal Processing Letters.

[40]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[41]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[42]  Liang Zheng,et al.  Unsupervised Person Re-identification: Clustering and Fine-tuning , 2017 .

[43]  Qi Tian,et al.  MARS: A Video Benchmark for Large-Scale Person Re-Identification , 2016, ECCV.

[44]  M. Maqbool,et al.  GMMCP Tracker : Globally Optimal Generalized Maximum Multi Clique Problem for Multiple Object Tracking , 2022 .

[45]  Shaogang Gong,et al.  Learning a Discriminative Null Space for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[47]  Liang Lin,et al.  Deep feature learning with relative distance comparison for person re-identification , 2015, Pattern Recognit..

[48]  Jian-Huang Lai,et al.  Deep Ranking for Person Re-Identification via Joint Representation Learning , 2015, IEEE Transactions on Image Processing.

[49]  Shaogang Gong,et al.  Person Re-Identification by Discriminative Selection in Video Ranking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Rong Jin,et al.  Top Rank Optimization in Linear Time , 2014, NIPS.

[51]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52]  Martial Hebert,et al.  Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[53]  Alberto Del Bimbo,et al.  Person Re-Identification by Iterative Re-Weighted Sparse Ranking , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Shaogang Gong,et al.  Person Re-identification by Video Ranking , 2014, ECCV.

[56]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Hai Tao,et al.  Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features , 2008, ECCV.

[58]  Horst Bischof,et al.  Person Re-identification by Descriptive and Discriminative Classification , 2011, SCIA.

[59]  Xiang Li,et al.  Top-Push Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Xiaogang Wang,et al.  DeepReID: Deep Filter Pairing Neural Network for Person Re-identification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Lin Wu,et al.  Where-and-When to Look: Deep Siamese Attention Networks for Video-Based Person Re-Identification , 2018, IEEE Transactions on Multimedia.

[62]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[63]  Neil D. Lawrence,et al.  Recurrent Gaussian Processes , 2015, ICLR.

[64]  Richard I. Hartley,et al.  Person Reidentification Using Spatiotemporal Appearance , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[65]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[66]  Alan L. Yuille,et al.  One Shot Learning via Compositions of Meaningful Patches , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[67]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[68]  Shengcai Liao,et al.  Deep Metric Learning for Person Re-identification , 2014, 2014 22nd International Conference on Pattern Recognition.

[69]  Zhen Li,et al.  Learning Locally-Adaptive Decision Functions for Person Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Sergio A. Velastin,et al.  Re-identification of Pedestrians in Crowds Using Dynamic Time Warping , 2012, ECCV Workshops.

[71]  Qiang Yang,et al.  Cross Validation Framework to Choose amongst Models and Datasets for Transfer Learning , 2010, ECML/PKDD.

[72]  Bingpeng Ma,et al.  A Spatio-Temporal Appearance Representation for Video-Based Pedestrian Re-Identification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[73]  Victor S. Lempitsky,et al.  Multi-Region bilinear convolutional neural networks for person re-identification , 2015, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[74]  Xiao-Yuan Jing,et al.  Video-Based Person Re-Identification by Simultaneously Learning Intra-Video and Inter-Video Distance Metrics , 2016, IEEE Transactions on Image Processing.

[75]  Andrew W. Senior,et al.  Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.

[76]  Shaogang Gong,et al.  Person Re-Identification by Unsupervised Video Matching , 2016, Pattern Recognit..

[77]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[78]  Shaogang Gong,et al.  Reidentification by Relative Distance Comparison , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.