Deep Learning-Based Computer Vision for Surveillance in ITS: Evaluation of State-of-the-Art Methods

Intelligent transportation system (ITS) collects numerous data for analysis of the transportation system. The data can be used for providing services for travellers and traffic controllers in the ITS and optimizing it, for the purpose of making the transportation more efficient and safer. Due to the wide and flexible employment of video cameras in visual surveillance system (VSS), mature edge-cloud resource scheduling for data transmission and analysis, and the fast development of deep learning, computer vision (CV) methods have been employed in the visual-based ITS services successfully. In this paper, we discuss the edge-cloud surveillance resource scheduling for the CV methods and review the deep learning-based CV methods in the VSS, including detection, classification, and tracking methods, for better understanding of the relationship between the CV-based ITS services and these methods. We experimentally compare several state-of-the-art deep learning-based methods, which have been successfully applied in the CV fields under the ITS scenario, on their performance, inference speed, computational quantity, and model size. According to the comparisons, we propose four main challenges of the deep learning-based CV methods applied in the services, as a discussion of the future research directions. Code are available at https://github.com/PRIS-CV/DL-CV-ITS.

[1]  Yi-Zhe Song,et al.  Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches , 2020, ECCV.

[2]  Hong Wang,et al.  Evolving boxes for fast vehicle detection , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[3]  Ben Southall,et al.  Stereo-Based Object Detection, Classi?cation, and Quantitative Evaluation with Automotive Applications , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[4]  Jingyu Wang,et al.  Knowledge-Driven Service Offloading Decision for Vehicular Edge Computing: A Deep Reinforcement Learning Approach , 2019, IEEE Transactions on Vehicular Technology.

[5]  Shu Kong,et al.  Low-Rank Bilinear Pooling for Fine-Grained Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Sergio A. Velastin,et al.  A Review of Computer Vision Techniques for the Analysis of Urban Traffic , 2011, IEEE Transactions on Intelligent Transportation Systems.

[7]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Volker Eiselein,et al.  High-Speed tracking-by-detection without using image information , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[9]  Guangquan Fu,et al.  The intelligent traffic control based on the Internet of Things , 2015, 2015 8th International Conference on Biomedical Engineering and Informatics (BMEI).

[10]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Bernold Rix,et al.  Distance Measurement Using Near Infrared Sensors , 2016 .

[12]  Bohyung Han,et al.  Multi-object Tracking with Quadruplet Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Haifeng Li,et al.  Robust Multivehicle Tracking With Wasserstein Association Metric in Surveillance Videos , 2020, IEEE Access.

[14]  Bailing Zhang Classification and identification of vehicle type and make by cortex-like image descriptor HMAX , 2014, Int. J. Comput. Vis. Robotics.

[15]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[17]  Jie Cao,et al.  Dual Cross-Entropy Loss for Small-Sample Fine-Grained Vehicle Classification , 2019, IEEE Transactions on Vehicular Technology.

[18]  Jie Zhang,et al.  Mobile-Edge Computation Offloading for Ultradense IoT Networks , 2018, IEEE Internet of Things Journal.

[19]  Yi-Zhe Song,et al.  The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification , 2020, IEEE Transactions on Image Processing.

[20]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[21]  Shaoguo Wen,et al.  Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification , 2020, ArXiv.

[22]  Xinge You,et al.  Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition , 2018, ECCV.

[23]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[24]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[25]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Takayuki Ito,et al.  Driver classification for intelligent transportation systems using fuzzy logic , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[28]  Orhan Bulan,et al.  Segmentation- and Annotation-Free License Plate Recognition With Deep Localization and Failure Identification , 2017, IEEE Transactions on Intelligent Transportation Systems.

[29]  Tao Mei,et al.  Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Tao Hu,et al.  See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification , 2019, ArXiv.

[31]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Guna Seetharaman,et al.  Multi-object Tracking Cascade with Multi-Step Data Association and Occlusion Handling , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[33]  Medhat Moussa,et al.  Deep Learning for Intelligent Transportation Systems: A Survey of Emerging Trends , 2020, IEEE Transactions on Intelligent Transportation Systems.

[34]  Qinggang Meng,et al.  Vehicle Detection from UAVs by Using SIFT with Implicit Shape Model , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[35]  Chong Wang,et al.  New Object Detection, Tracking, and Recognition Approaches for Video Surveillance Over Camera Network , 2015, IEEE Sensors Journal.

[36]  K. B. Letaief,et al.  Mobile Edge Intelligence and Computing for the Internet of Vehicles , 2019, Proceedings of the IEEE.

[37]  Linda G. Shapiro,et al.  Unsupervised Template Learning for Fine-Grained Object Recognition , 2012, NIPS.

[38]  Yuxing Peng,et al.  ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[41]  Wei Zhang,et al.  Multilevel Framework to Detect and Handle Vehicle Occlusion , 2008, IEEE Transactions on Intelligent Transportation Systems.

[42]  Zhanyu Ma,et al.  IU-Module: Intersection and Union Module for Fine-Grained Visual Classification , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[43]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Guillaume-Alexandre Bilodeau,et al.  Supervised and Unsupervised Detections for Multiple Object Tracking in Traffic Scenes: A Comparative Study , 2020, ICIAR.

[48]  Ruifeng Zhang,et al.  A method for connected vehicle trajectory prediction and collision warning algorithm based on V2V communication , 2017 .

[49]  Xiaoou Tang,et al.  A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Ao Li,et al.  Real-Time Tracking of Vehicles with Siamese Network and Backward Prediction , 2020, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[51]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  G. Zhai,et al.  Three-branch and Mutil-scale learning for Fine-grained Image Recognition (TBMSL-Net) , 2020, ArXiv.

[53]  Yuan Yuan,et al.  Pixel-Wise Crowd Understanding via Synthetic Data , 2020, International Journal of Computer Vision.

[54]  Liang Wang,et al.  Multi-task Deep Learning for Fast Online Multiple Object Tracking , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[55]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Joseph Redmon,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[57]  Mubarak Shah,et al.  Deep Affinity Network for Multiple Object Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[59]  Zehang Sun,et al.  Monocular precrash vehicle detection: features and classifiers , 2006, IEEE Transactions on Image Processing.

[60]  Federico Tombari,et al.  Joint Detection and Tracking in Videos with Identification Features , 2020, Image Vis. Comput..

[61]  Chia-Hung Yeh,et al.  Robust Vehicle and Traffic Information Extraction for Highway Surveillance , 2005, EURASIP J. Adv. Signal Process..

[62]  Xuelong Li,et al.  NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Larry S. Davis,et al.  Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Tobias Senst,et al.  Extending IOU Based Multi-Object Tracking by Visual Information , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[65]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[67]  Wei Wu,et al.  Adaptive Dilated Network With Self-Correction Supervision for Counting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Daniel Alvear,et al.  Decision support system for emergency management: Road tunnels , 2013 .

[69]  Qixiang Ye,et al.  Selective Sparse Sampling for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[70]  Yihong Gong,et al.  Bayesian Loss for Crowd Count Estimation With Point Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[71]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[72]  Francisco Herrera,et al.  Deep Learning in Video Multi-Object Tracking: A Survey , 2019, Neurocomputing.

[73]  Yi Wang,et al.  Vehicle Tracking Using Deep SORT with Low Confidence Track Filtering , 2019, 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[74]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[75]  Dongliang Chang,et al.  Your “Flamingo” is My “Bird”: Fine-Grained, or Not , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Lei Zhang,et al.  Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[77]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[78]  R. Venkatesh Babu,et al.  Locate, Size, and Count: Accurately Resolving People in Dense Crowds via Detection , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Shaoguo Wen,et al.  Fine-Grained Vehicle Classification With Channel Max Pooling Modified CNNs , 2019, IEEE Transactions on Vehicular Technology.

[80]  Bijan Shoushtarian,et al.  Multiple-Vehicle Tracking in the Highway Using Appearance Model and Visual Object Tracking , 2020, 2020 International Conference on Machine Vision and Image Processing (MVIP).

[81]  Ming Tang,et al.  Hierarchical and Networked Vehicle Surveillance in ITS: A Survey , 2015, IEEE Transactions on Intelligent Transportation Systems.

[82]  Jun Zhang,et al.  Moving Vehicle Tracking Based on SIFT Active Particle Choosing , 2008, ICONIP.

[83]  Uwe Franke,et al.  6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception , 2005, DAGM-Symposium.

[84]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  Li-Chen Fu,et al.  Integrating Appearance and Edge Features for Sedan Vehicle Detection in the Blind-Spot Area , 2012, IEEE Transactions on Intelligent Transportation Systems.

[86]  Ming-Hsuan Yang,et al.  UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking , 2015, Comput. Vis. Image Underst..

[87]  Lisa M. Brown,et al.  A closer look at Faster R-CNN for vehicle detection , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[88]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Wei Wu,et al.  Attention Guided Region Division for Crowd Counting , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[90]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[91]  Martin Lauer,et al.  Online Multi-Object Tracking Using Joint Domain Information in Traffic Scenarios , 2020, IEEE Transactions on Intelligent Transportation Systems.

[92]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[93]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[94]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[95]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[96]  Kwangjin Yoon,et al.  Online and Real-Time Tracking with the GM-PHD Filter using Group Management and Relative Motion Analysis , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[97]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[98]  Nei Kato,et al.  Machine Learning Meets Computation and Communication Control in Evolving Edge and Cloud: Challenges and Future Perspective , 2020, IEEE Communications Surveys & Tutorials.