Faster RER-CNN: application to the detection of vehicles in aerial images

Detecting small vehicles in aerial images is a difficult job that can be challenging even for humans. Rotating objects, low resolution, small inter-class variability and very large images comprising complicated backgrounds render the work of photo-interpreters tedious and wearisome. Unfortunately even the best classical detection pipelines like Faster R-CNN cannot be used off-the-shelf with good results because they were built to process object centric images from day-to-day life with multi-scale vertical objects. In this work we build on the Faster R-CNN approach to turn it into a detection framework that deals appropriately with the rotation equivariance inherent to any aerial image task. This new pipeline (Faster Rotation Equivariant Regions CNN) gives, without any bells and whistles, state-of-the-art results on one of the most challenging aerial imagery datasets: VeDAI and give good results w.r.t. the baseline Faster R-CNN on two others: Munich and GoogleEarth .

[1]  Peter Reinartz,et al.  An Operational System for Estimating Road Traffic Information from Aerial Images , 2014, Remote. Sens..

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[4]  Andrea Vedaldi,et al.  Warped Convolutions: Efficient Invariance to Spatial Transformations , 2016, ICML.

[5]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[6]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[9]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[10]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Line Eikvil,et al.  Classification-based vehicle detection in high-resolution satellite images , 2009 .

[12]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Terrence Fong,et al.  Vehicle detection from aerial imagery , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14]  Larry S. Davis,et al.  Vehicle Detection Using Partial Least Squares , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Wesam A. Sakla,et al.  A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning , 2016, ECCV.

[20]  Frédéric Jurie,et al.  Discriminative Autoencoders for Small Targets Detection , 2014, 2014 22nd International Conference on Pattern Recognition.

[21]  Bertrand Le Saux,et al.  Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images , 2017, Remote. Sens..

[22]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[23]  Ivan E. Sutherland,et al.  Reentrant polygon clipping , 1974, Commun. ACM.

[24]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[25]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[27]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[28]  Jiri Matas,et al.  Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Gang Hua,et al.  Supervised Transformer Network for Efficient Face Detection , 2016, ECCV.

[30]  Frédéric Jurie,et al.  On the use of deep neural networks for the detection of small vehicles in ortho-images , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[31]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[32]  Lin Lei,et al.  Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining , 2017, Sensors.

[33]  Shiming Xiang,et al.  Vehicle Detection in Satellite Images by Parallel Deep Convolutional Neural Networks , 2013, 2013 2nd IAPR Asian Conference on Pattern Recognition.

[34]  Frédéric Jurie,et al.  Vehicle detection in aerial imagery : A small target detection benchmark , 2016, J. Vis. Commun. Image Represent..

[35]  Nikos Komodakis,et al.  Rotation Equivariant Vector Field Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Gellért Máttyus,et al.  Fast Multiclass Vehicle Detection on Aerial Images , 2015, IEEE Geoscience and Remote Sensing Letters.

[37]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Pascal Fua,et al.  A Real-Time Deformable Detector , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.