IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection

Object detection in aerial images is a challenging task due to its lack of visiable features and variant orientation of objects. Currently, amount of R-CNN framework based detectors have made significant progress in predicting targets by horizontal bounding boxes (HBB) and oriented bounding boxes (OBB). However, there is still open space for one-stage anchor free solutions. This paper proposes a one-stage anchor free detector for orientional object in aerial images, which is built upon a per-pixel prediction fashion detector. We make it possible by developing a branch interacting module with a self-attention mechanism to fuse features from classification and box regression branchs. Moreover a geometric transformation is employed in angle prediction to make it more manageable for the prediction network. We also introduce an IOU loss for OBB detection, which is more efficient than regular polygon IOU. The propsed method is evaluated on DOTA and HRSC2016 datasets, and the outcomes show the higher OBB detection performance from our propsed IENet when compared with the state-of-the-art detectors.

[1]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[2]  Wenxian Yu,et al.  Toward Arbitrary-Oriented Ship Detection With Rotated Region Proposal and Discrimination Networks , 2018, IEEE Geoscience and Remote Sensing Letters.

[3]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[4]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Peter Reinartz,et al.  Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery , 2018, ACCV.

[6]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[9]  Yiping Yang,et al.  Rotated region based CNN for ship detection , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[10]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[11]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Gang Yu,et al.  Scene Text Detection with Supervised Pyramid Context Network , 2018, AAAI.

[15]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xilin Chen,et al.  Interlaced Sparse Self-Attention for Semantic Segmentation , 2019, ArXiv.

[17]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[19]  Linjie Xing,et al.  Convolutional Character Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[21]  Ying Chen,et al.  M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network , 2018, AAAI.

[22]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Zhi Tang,et al.  CBNet: A Novel Composite Backbone Network Architecture for Object Detection , 2019, AAAI.

[26]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[27]  Lianwen Jin,et al.  Omnidirectional Scene Text Detection with Sequential-free Box Discretization , 2019, IJCAI.

[28]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[29]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[31]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[32]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[33]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[34]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[35]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[37]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[39]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Lianwen Jin,et al.  Tightness-Aware Evaluation Protocol for Scene Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[43]  Dongyoon Han,et al.  Character Region Awareness for Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Gui-Song Xia,et al.  Learning RoI Transformer for Detecting Oriented Objects in Aerial Images , 2018, ArXiv.