PROVID: Progressive and Multimodal Vehicle Reidentification for Large-Scale Urban Surveillance

Compared with person reidentification, which has attracted concentrated attention, vehicle reidentification is an important yet frontier problem in video surveillance and has been neglected by the multimedia and vision communities. Since most existing approaches mainly consider the general vehicle appearance for reidentification while overlooking the distinct vehicle identifier, such as the license plate number, they attain suboptimal performance. In this paper, we propose PROVID, a PROgressive Vehicle re-IDentification framework based on deep neural networks. In particular, our framework not only utilizes the multimodality data in large-scale video surveillance, such as visual features, license plates, camera locations, and contextual information, but also considers vehicle reidentification in two progressive procedures: coarse-to-fine search in the feature domain, and near-to-distant search in the physical space. Furthermore, to evaluate our progressive search framework and facilitate related research, we construct the VeRi dataset, which is the most comprehensive dataset from real-world surveillance videos. It not only provides large numbers of vehicles with varied labels and sufficient cross-camera recurrences but also contains license plate numbers and contextual information. Extensive experiments on the VeRi dataset demonstrate both the accuracy and efficiency of our progressive vehicle reidentification framework.

[1]  Sharath Pankanti,et al.  Large-Scale Vehicle Detection, Indexing, and Search in Urban Surveillance Videos , 2012, IEEE Transactions on Multimedia.

[2]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[3]  Carlos Busso,et al.  Modeling of Driver Behavior in Real World Scenarios Using Multiple Noninvasive Sensors , 2013, IEEE Transactions on Multimedia.

[4]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ramin Zabih,et al.  Bayesian multi-camera surveillance , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[6]  Mubarak Shah,et al.  Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views , 2008, Comput. Vis. Image Underst..

[7]  Fei-Yue Wang,et al.  Data-Driven Intelligent Transportation Systems: A Survey , 2011, IEEE Transactions on Intelligent Transportation Systems.

[8]  Zi Huang,et al.  Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval , 2013, IEEE Transactions on Multimedia.

[9]  John W. Sammon,et al.  An Optimal Set of Discriminant Vectors , 1975, IEEE Transactions on Computers.

[10]  Wu Liu,et al.  A Progressive Search Paradigm for the Internet of Things , 2018, IEEE MultiMedia.

[11]  B. S. Manjunath,et al.  Graph-Based Topic-Focused Retrieval in Distributed Camera Network , 2013, IEEE Transactions on Multimedia.

[12]  B. S. Manjunath,et al.  Context-Aware Hypergraph Modeling for Re-identification and Summarization , 2016, IEEE Transactions on Multimedia.

[13]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[14]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Tao Mei,et al.  A Deep Learning-Based Approach to Progressive Vehicle Re-identification for Urban Surveillance , 2016, ECCV.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Licia Capra,et al.  Urban Computing: Concepts, Methodologies, and Applications , 2014, TIST.

[19]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[20]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21]  Cordelia Schmid,et al.  Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[22]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[23]  Supun Samarasekera,et al.  Vehicle tracking across nonoverlapping cameras using joint kinematic and appearance features , 2011, CVPR 2011.

[24]  Sergio A. Velastin,et al.  Intelligent distributed surveillance systems: a review , 2005 .

[25]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[26]  Xiangyang Xue,et al.  Null Foley-Sammon transform , 2006, Pattern Recognit..

[27]  Wu Liu,et al.  Siamese neural network based gait recognition for human identification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Shaogang Gong,et al.  Learning a Discriminative Null Space for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yongdong Zhang,et al.  Multi-task deep visual-semantic embedding for video thumbnail selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Qi Tian,et al.  Fine-Grained Image Search , 2015, IEEE Transactions on Multimedia.

[31]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[32]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[33]  Gang Wang,et al.  Object Instance Search in Videos via Spatio-Temporal Trajectory Discovery , 2016, IEEE Transactions on Multimedia.

[34]  Xiaoou Tang,et al.  A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Tiejun Huang,et al.  Deep Relative Distance Learning: Tell the Difference between Similar Vehicles , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[37]  Wael Badawy,et al.  Automatic License Plate Recognition (ALPR): A State-of-the-Art Review , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Wu Liu,et al.  Large-scale vehicle re-identification in urban surveillance videos , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[40]  Adam Herout,et al.  Vehicle Re-identification for Automatic Video Traffic Surveillance , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[41]  Qi Tian,et al.  Multimedia search reranking: A literature survey , 2014, CSUR.

[42]  Pengfei Shi,et al.  An Algorithm for License Plate Recognition Applied to Intelligent Transportation System , 2011, IEEE Transactions on Intelligent Transportation Systems.