Segmentation based rotated bounding boxes prediction and image synthesizing for object detection of high resolution aerial images

Abstract Object detection for aerial images is becoming an active topic in computer vision with many real-world applications. It is a very challenging task due to many factors such as highly complex background, arbitrary object orientations, high input resolution, etc. In this paper, we develop a new training and inference mechanism, which is shown to significantly improve the detection accuracy for high resolution aerial images. Instead of estimating the orientations of objects using direct regressions like in previous methods, we propose to predict the rotated bounding boxes by leveraging a segmentation task, which is easier to train and yields more accurate detection results. In addition, an image synthesizing based data augmentation strategy is presented to address the data imbalance issues in aerial object detection. Extensive experiments have been conducted to verify our contribution. The proposed method sets new state-of-the-art performance on the challenging DOTA dataset. The source codes will be available at http://ice.dlut.edu.cn/lu/publications.html.

[1]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yongchao Gong,et al.  Mask Scoring R-CNN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yiping Yang,et al.  Ship Rotated Bounding Box Space for Ship Extraction From High-Resolution Optical Satellite Images With Complex Backgrounds , 2016, IEEE Geoscience and Remote Sensing Letters.

[5]  Yilong Yin,et al.  Integrating QDWD with pattern distinctness and local contrast for underwater saliency detection , 2018, J. Vis. Commun. Image Represent..

[6]  G. Toussaint Solving geometric problems with the rotating calipers , 1983 .

[7]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[8]  Huicheng Zheng,et al.  Detail preservation and feature refinement for object detection , 2019, Neurocomputing.

[9]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[10]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[12]  Larry S. Davis,et al.  SNIPER: Efficient Multi-Scale Training , 2018, NeurIPS.

[13]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[14]  Qi Wang,et al.  Salience based object tracking in complex scenes , 2018, Neurocomputing.

[15]  Gui-Song Xia,et al.  Learning RoI Transformer for Detecting Oriented Objects in Aerial Images , 2018, ArXiv.

[16]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yilong Yin,et al.  Saliency detection based on directional patches extraction and principal local color contrast , 2018, J. Vis. Commun. Image Represent..

[19]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[20]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[21]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[23]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Qixiang Ye,et al.  Orientation robust object detection in aerial images using deep convolutional neural network , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[26]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Xuelong Li,et al.  PixelLink: Detecting Scene Text via Instance Segmentation , 2018, AAAI.

[29]  Peter Reinartz,et al.  Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery , 2018, ACCV.

[30]  Menglong Yan,et al.  Automatic Ship Detection in Remote Sensing Images from Google Earth of Complex Scenes Based on Multiscale Rotation Dense Feature Pyramid Networks , 2018, Remote. Sens..

[31]  Yilong Yin,et al.  The extended marine underwater environment database and baseline evaluations , 2019, Appl. Soft Comput..

[32]  Kai Chen,et al.  Hybrid Task Cascade for Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yong Zhang,et al.  Text string detection for loosely constructed characters with arbitrary orientations , 2015, Neurocomputing.

[34]  Larry S. Davis,et al.  Dynamic Zoom-in Network for Fast Object Detection in Large Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Wenyu Liu,et al.  PCL: Proposal Cluster Learning for Weakly Supervised Object Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  LinLin Shen,et al.  Visual-Patch-Attention-Aware Saliency Detection , 2015, IEEE Transactions on Cybernetics.

[37]  Chao Li,et al.  Cascade Region Proposal and Global Context for Deep Object Detection , 2017, Neurocomputing.

[38]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[40]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Menglong Yan,et al.  Position Detection and Direction Prediction for Arbitrary-Oriented Ships via Multitask Rotation Region Convolutional Neural Network , 2018, IEEE Access.

[42]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Frédéric Jurie,et al.  Vehicle detection in aerial imagery : A small target detection benchmark , 2016, J. Vis. Commun. Image Represent..

[46]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).