Cross-Layer Attention Network for Small Object Detection in Remote Sensing Imagery

In recent years, despite the tremendous progresses of object detection, small object detection has always been a challenge in the field of remote sensing. The main reason is that small objects cover few features that are easily lost during down-sampling. In this article, we propose a cross-layer attention network aiming to obtain stronger features of small objects for better detection. Specifically, we designed an up-sampling and down-sampling feature pyramid to obtain richer context information by bidirectionally fusing deep and shallow features, as well as skipping connections. Moreover, a cross-layer attention module is designed to obtain the nonlocal association of small objects in each layer, and further strengthen its representation ability through cross-layer integration and balance. Extensive experiments on the publicly available datasets (DIOR dataset and NWPUVHR-10 dataset) and the self-assembled datasets (SDOTA dataset and SDD dataset) show the excellent performance of our method compared with other detectors. Moreover, our method achieved 74.3% mAP on the public DIOR dataset without any tricks.

[1]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[2]  Yong Liu,et al.  Extended Feature Pyramid Network for Small Object Detection , 2020, ArXiv.

[3]  Junwei Han,et al.  Multi-class geospatial object detection and geographic image classification based on collection of part detectors , 2014 .

[4]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[5]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Lisha Cui,et al.  MDSSD: multi-scale deconvolutional single shot detector for small objects , 2018, Science China Information Sciences.

[7]  Huajun Feng,et al.  Libra R-CNN: Towards Balanced Learning for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[10]  Jakaria Rabbi,et al.  Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network , 2020, ArXiv.

[11]  Bo Du,et al.  A Subspace Selection-Based Discriminative Forest Method for Hyperspectral Anomaly Detection , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Menglong Yan,et al.  Position Detection and Direction Prediction for Arbitrary-Oriented Ships via Multitask Rotation Region Convolutional Neural Network , 2018, IEEE Access.

[13]  Chen Chen,et al.  Density Map Guided Object Detection in Aerial Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[15]  Gang Wan,et al.  Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark , 2020, ISPRS Journal of Photogrammetry and Remote Sensing.

[16]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Erik Blasch,et al.  Clustered Object Detection in Aerial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Bo Du,et al.  Unsupervised Deep Slow Feature Analysis for Change Detection in Multi-Temporal Remote Sensing Images , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Shunping Xiao,et al.  Small Object Detection in Optical Remote Sensing Images via Modified Faster R-CNN , 2018 .

[23]  Shijian Lu,et al.  CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[24]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[25]  Seung-Ik Lee,et al.  Small Object Detection using Context and Attention , 2019, 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC).

[26]  Menglong Yan,et al.  Automatic Ship Detection in Remote Sensing Images from Google Earth of Complex Scenes Based on Multiscale Rotation Dense Feature Pyramid Networks , 2018, Remote. Sens..

[27]  Wei Li,et al.  R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection , 2017, ArXiv.

[28]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[30]  Ke Li,et al.  Rotation-Insensitive and Context-Augmented Object Detection in Remote Sensing Images , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[31]  Ronghua Shang,et al.  A Deep Learning Method for Change Detection in Synthetic Aperture Radar Images , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[34]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Bo Du,et al.  Object Tracking in Satellite Videos Based on a Multi-Frame Optical Flow Tracker , 2018, ArXiv.

[37]  Junwei Han,et al.  Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding , 2014 .

[38]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[39]  Yangyang Li,et al.  RADet: Refine Feature Pyramid Network and Multi-Layer Attention Network for Arbitrary-Oriented Object Detection of Remote Sensing Images , 2020, Remote. Sens..

[40]  Dong Xu,et al.  Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection , 2019, IEEE Transactions on Image Processing.

[41]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[42]  Xiwen Yao,et al.  Cross-Scale Feature Fusion for Object Detection in Optical Remote Sensing Images , 2021, IEEE Geoscience and Remote Sensing Letters.

[43]  Kun Fu,et al.  FMSSD: Feature-Merged Single-Shot Detection for Multiscale Objects in Large-Scale Remote Sensing Imagery , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[44]  Yangyang Li,et al.  Anchor-Free Single Stage Detector in Remote Sensing Images Based on Multiscale Dense Path Aggregation Feature Pyramid Network , 2020, IEEE Access.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[47]  Ronghua Shang,et al.  Densely Based Multi-Scale and Multi-Modal Fully Convolutional Networks for High-Resolution Remote-Sensing Image Semantic Segmentation , 2019, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[48]  Yi Yang,et al.  DenseBox: Unifying Landmark Localization with End to End Object Detection , 2015, ArXiv.