Refining Yolov4 for Vehicle Detection

Real-time vehicle detection is a technology employed in applications like selfdriving cars, traffic camera surveillance. Every year we see better and updated stateof-the-art (SOTA) object detectors, but as those are trained on general-purpose datasets (like MS COCO), we miss out on targeted model improvements for vehicular data. The aim of this paper is to improve the newly released, YOLOv4 detector, specifically, for vehicle tracking applications using some existing methods such as optimising anchor box predictions by using k-means clustering. We also carefully hand-pick and verify some key techniques mentioned in the original paper, to optimise YOLOv4 as per the requirements of our dataset (UA-DETRAC).<br><br>Our fine-tuned model is also compared with the existing models on a number of performance metrics such as - precision, recall, F1 score, mean average precision, and the average IoU. Our experimental results show that the SOTA model which already has real-time object detection capabilities can be further improved for highly targeted use cases. We urge the readers to expand the scope of the paper (and the original model) to other specific situations as well.

[1]  Jiawei Han,et al.  K-Means Clustering , 2021, Learn Data Mining Through Excel.

[2]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[3]  Jun-Wei Hsieh,et al.  CSPNet: A New Backbone that can Enhance Learning Capability of CNN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Zhaohui Zheng,et al.  Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression , 2019, AAAI.

[5]  Huali Wang,et al.  Vehicle target detection in complex scenes based on YOLOv3 algorithm , 2019, IOP Conference Series: Materials Science and Engineering.

[6]  S. V,et al.  Colour Based Image Segmentation Using Hybrid Kmeans With Watershed Segmentation , 2018 .

[7]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[8]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Joseph Redmon,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[10]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Serge J. Belongie,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ming-Hsuan Yang,et al.  UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking , 2015, Comput. Vis. Image Underst..

[17]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[20]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[21]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  B. Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[25]  Samiran Chattopadhyay,et al.  A NOVEL DISTANCE BASED MODIFIED K-MEANS CLUSTERING ALGORITHM FOR ESTIMATION OF MISSING VALUES IN MICRO-ARRAY GENE EXPRESSION DATA , 2014 .

[26]  N. B. Venkateswarlu,et al.  A NOVEL K-MEANS BASED JPEG ALGORITHM FOR STILL IMAGE COMPRESSION , 2012 .

[27]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.