Inception single shot multi-box detector with affinity propagation clustering and their application in multi-class vehicle counting

Multi-class vehicle detection and counting in video-based traffic surveillance systems with real-time performance and acceptable precision are challenging. This paper proposes a modified single shot multi-box convolutional neural network named Inception-SSD (ISSD) for vehicle detection and a centroid matching algorithm for vehicle counting. An Inception-like block is introduced to replace the extra feature layers in the original SSD to deal with the multi-scale vehicle detection to enhance smaller vehicles’ detection. Non-Maximum Suppression (NMS) is replaced with Affinity Propagation Clustering (APC) to improve the detection of nearby occluded vehicles. For a 300 × 300 input image, on PASCAL VOC 2007 test data set, the proposed ISSD achieved 79.3 mean Average Precision (mAP) and ran on an NVIDIA RTX2080Ti; the network attains a speed of 52.3 frames per second. ISSD with APC generates 2.7% improvement in mAP over original SSD300 while almost retaining its time efficiency. By centroid matching algorithm, the vehicles are counted class-wise with a weighted F1 of 98.5%, which is quite superior to the other recent existing research works.

[1]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Dwi Ratna Sulistyaningrum,et al.  Moving Vehicle Classification Using Pixel Quantity Based on Gaussian Mixture Models , 2018, 2018 3rd International Conference on Computer and Communication Systems (ICCCS).

[3]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[4]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Varun P. Gopi,et al.  Moving Vehicle Candidate Recognition and Classification Using Inception-ResNet-v2 , 2020, 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC).

[6]  Chunsheng Liu,et al.  Bi-Directional Dense Traffic Counting Based on Spatio-Temporal Counting Feature and Counting-LSTM Network , 2021, IEEE Transactions on Intelligent Transportation Systems.

[7]  Massimo Piccardi,et al.  Background subtraction techniques: a review , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[8]  Huansheng Song,et al.  Vehicle Counting System using Deep Learning and Multi-Object Tracking Methods , 2020, Transportation Research Record: Journal of the Transportation Research Board.

[9]  M. A. Abdelwahab,et al.  Fast approach for efficient vehicle counting , 2019, Electronics Letters.

[10]  Hongxing Gao,et al.  Mathematical Modelling and Computational Simulation of the Hydraulic Damper during the Orifice-Working Stage for Railway Vehicles , 2020 .

[11]  Alberto Broggi,et al.  Vehicle and Guard Rail Detection Using Radar and Vision Data Fusion , 2007, IEEE Transactions on Intelligent Transportation Systems.

[12]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Lawrence A Klein,et al.  SUMMARY OF VEHICLE DETECTION AND SURVEILLANCE TECHNOLOGIES USED IN INTELLIGENT TRANSPORTATION SYSTEMS , 2000 .

[16]  Shuyuan Yang,et al.  A Survey of Deep Learning-Based Object Detection , 2019, IEEE Access.

[17]  Matthieu Guillaumin,et al.  Non-maximum Suppression for Object Detection by Passing Messages Between Windows , 2014, ACCV.

[18]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[19]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[20]  Varun P. Gopi,et al.  Pixel matching search algorithm for counting moving vehicle in highway traffic videos , 2020, Multim. Tools Appl..

[21]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[25]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[26]  Tien-Wei Shyr,et al.  A Textile-Based Wearable Sensing Device Designed for Monitoring the Flexion Angle of Elbow and Knee Movements , 2014, Sensors.

[27]  Zhiqiang Shen,et al.  DSOD: Learning Deeply Supervised Object Detectors from Scratch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Sandeep Singh Sengar,et al.  A novel method for moving object detection based on block based frame differencing , 2016, 2016 3rd International Conference on Recent Advances in Information Technology (RAIT).

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  S. Gayathri,et al.  Automated classification of diabetic retinopathy through reliable feature selection , 2020, Physical and Engineering Sciences in Medicine.

[33]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[34]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  P. N. Druzhkov,et al.  A survey of deep learning methods and software tools for image classification and object detection , 2016, Pattern Recognition and Image Analysis.

[36]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[37]  Rita Cucchiara,et al.  Detecting Moving Objects, Ghosts, and Shadows in Video Streams , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Du Q. Huynh,et al.  A Vision-Based Pipeline for Vehicle Counting, Speed Estimation, and Classification , 2021, IEEE Transactions on Intelligent Transportation Systems.

[39]  Inbum Jung,et al.  Analysis of Vehicle Detection with WSN-Based Ultrasonic Sensors , 2014, Sensors.

[40]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[41]  Fei Liu,et al.  A video-based real-time adaptive vehicle-counting system for urban roads , 2017, PloS one.

[42]  L. Kiemeney,et al.  Obesity, metabolic factors and risk of different histological types of lung cancer: A Mendelian randomization study , 2017, PloS one.

[43]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[44]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[46]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Jukka Riekki,et al.  Urban traffic analysis through multi-modal sensing , 2015, Personal and Ubiquitous Computing.

[48]  Li Gang,et al.  Video-Based Vehicle Counting for Expressway: A Novel Approach Based on Vehicle Detection and Correlation-Matched Tracking Using Image Data from PTZ Cameras , 2020, Mathematical Problems in Engineering.

[49]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[50]  Deyun Xiao,et al.  Review on vehicle detection based on video for traffic surveillance , 2008, 2008 IEEE International Conference on Automation and Logistics.

[51]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[52]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Mohamed A. Abdelwahab Accurate Vehicle Counting Approach Based on Deep Neural Networks , 2019, 2019 International Conference on Innovative Trends in Computer Engineering (ITCE).

[55]  Yan Song,et al.  Inception Single Shot MultiBox Detector for object detection , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).