论文信息 - A Deep Learning-Based Approach to Progressive Vehicle Re-identification for Urban Surveillance

A Deep Learning-Based Approach to Progressive Vehicle Re-identification for Urban Surveillance

While re-identification (Re-Id) of persons has attracted intensive attention, vehicle, which is a significant object class in urban video surveillance, is often overlooked by vision community. Most existing methods for vehicle Re-Id only achieve limited performance, as they predominantly focus on the generic appearance of vehicle while neglecting some unique identities of vehicle (e.g., license plate). In this paper, we propose a novel deep learning-based approach to PROgressive Vehicle re-ID, called “PROVID”. Our approach treats vehicle Re-Id as two specific progressive search processes: coarse-to-fine search in the feature space, and near-to-distant search in the real world surveillance environment. The first search process employs the appearance attributes of vehicle for a coarse filtering, and then exploits the Siamese Neural Network for license plate verification to accurately identify vehicles. The near-to-distant search process retrieves vehicles in a manner like human beings, by searching from near to faraway cameras and from close to distant time. Moreover, to facilitate progressive vehicle Re-Id research, we collect to-date the largest dataset named VeRi-776 from large-scale urban surveillance videos, which contains not only massive vehicles with diverse attributes and high recurrence rate, but also sufficient license plates and spatiotemporal labels. A comprehensive evaluation on the VeRi-776 shows that our approach outperforms the state-of-the-art methods by 9.28 % improvements in term of mAP.

[1] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[2] Ramin Zabih,et al. Bayesian multi-camera surveillance , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[3] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4] Sergio A. Velastin,et al. Intelligent distributed surveillance systems: a review , 2005 .

[5] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6] Mubarak Shah,et al. Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views , 2008, Comput. Vis. Image Underst..

[7] Cordelia Schmid,et al. Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[8] Fei-Yue Wang,et al. Data-Driven Intelligent Transportation Systems: A Survey , 2011, IEEE Transactions on Intelligent Transportation Systems.

[9] Pengfei Shi,et al. An Algorithm for License Plate Recognition Applied to Intelligent Transportation System , 2011, IEEE Transactions on Intelligent Transportation Systems.

[10] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[11] Supun Samarasekera,et al. Vehicle tracking across nonoverlapping cameras using joint kinematic and appearance features , 2011, CVPR 2011.

[12] Sharath Pankanti,et al. Large-Scale Vehicle Detection, Indexing, and Search in Urban Surveillance Videos , 2012, IEEE Transactions on Multimedia.

[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14] Hui Xiong,et al. Introduction to special section on intelligent mobile knowledge discovery and management systems , 2013, ACM Trans. Intell. Syst. Technol..

[15] Wael Badawy,et al. Automatic License Plate Recognition (ALPR): A State-of-the-Art Review , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[16] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[17] B. S. Manjunath,et al. Graph-Based Topic-Focused Retrieval in Distributed Camera Network , 2013, IEEE Transactions on Multimedia.

[18] Qi Tian,et al. Multimedia search reranking: A literature survey , 2014, CSUR.

[19] Licia Capra,et al. Urban Computing: Concepts, Methodologies, and Applications , 2014, TIST.

[20] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21] Xiaoou Tang,et al. A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Yongdong Zhang,et al. Multi-task deep visual-semantic embedding for video thumbnail selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[24] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Qi Tian,et al. Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26] Shengcai Liao,et al. Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Wu Liu,et al. Siamese neural network based gait recognition for human identification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28] Wu Liu,et al. Large-scale vehicle re-identification in urban surveillance videos , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).