VDetor: An Effective and Efficient Neural Network for Vehicle Detection in Aerial Image

Vehicle detection in aerial image is the foundation of some applications, such as the traffic management, parking lot utilization, etc. Recently, universal object detection methods based on the convolutional neural networks have achieved state-of-the-art performances, this is mainly because CNNs can extract more effective features compared with the handcrafted features in early mainstream methods. Extracting effective feature is crucial for vehicle detection in aerial image where the vehicles are small and the background is rather complicated. As a result, these methods based on CNNs have been used to detect the vehicles in aerial image. However, the performance may be poor when directly performing these universal methods. Firstly, these existing methods mostly detect the vehicles with horizontal bounding boxes. But these horizontal boxes do not match the vehicles in aerial image with arbitrary orientations and multiply aspect ratios. As a result, this kind of box would harm the detection accuracy directly. In addition, these methods are mostly computationally expensive, so they are not suitable for the platform with limited computational resources, for example the unmanned aerial vehicle. To address these problems above, by introducing the rotated bounding box regression and a lightweight network into SSD, we propose our vehicle detection networks, VDetor. Specifically, we use rotated bounding box rather than horizontal one to match the vehicle well and use a lightweight network, PeleeNet, as the backbone network of SSD to speed up the inference. Experiments on VEDAI dataset illustrate that our model performs better than SSD in terms of the detection accuracy and detection speed.

[1]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Frédéric Jurie,et al.  Vehicle detection in aerial imagery : A small target detection benchmark , 2016, J. Vis. Commun. Image Represent..

[3]  Farid Melgani,et al.  Automatic Car Counting Method for Unmanned Aerial Vehicle Images , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[5]  Gellért Máttyus,et al.  Fast Multiclass Vehicle Detection on Aerial Images , 2015, IEEE Geoscience and Remote Sensing Letters.

[6]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Charles X. Ling,et al.  Pelee: A Real-Time Object Detection System on Mobile Devices , 2018, NeurIPS.

[8]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[9]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Farid Melgani,et al.  A SIFT-SVM method for detecting cars in UAV images , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.