An Improved Aggregated-Mosaic Method for the Sparse Object Detection of Remote Sensing Imagery

Object detection based on remote sensing imagery has become increasingly popular over the past few years. Unlike natural images taken by humans or surveillance cameras, the scale of remote sensing images is large, which requires the training and inference procedure to be on a cutting image. However, objects appearing in remote sensing imagery are often sparsely distributed and the labels for each class are imbalanced. This results in unstable training and inference. In this paper, we analyze the training characteristics of the remote sensing images and propose the fusion of the aggregated-mosaic training method, with the assigned-stitch augmentation and auto-target-duplication. In particular, based on the ground truth and mosaic image size, the assigned-stitch augmentation enhances each training sample with an appropriate account of objects, facilitating the smooth training procedure. Hard to detect objects, or those in classes with rare samples, are randomly selected and duplicated by the auto-target-duplication, which solves the sample imbalance or classes with insufficient results. Thus, the training process is able to focus on weak classes. We employ VEDAI and NWPU VHR-10, remote sensing datasets with sparse objects, to verify the proposed method. The YOLOv5 adopts the Mosaic as the augmentation method and is one of state-of-the-art detectors, so we choose Mosaic (YOLOv5) as the baseline. Results demonstrate that our method outperforms Mosaic (YOLOv5) by 2.72% and 5.44% on 512 × 512 and 1024 × 1024 resolution imagery, respectively. Moreover, the proposed method outperforms Mosaic (YOLOv5) by 5.48% under the NWPU VHR-10 dataset.

[1]  Junwei Han,et al.  A Survey on Object Detection in Optical Remote Sensing Images , 2016, ArXiv.

[2]  Sung-Bong Jang,et al.  A Comparison of Regularization Techniques in Deep Neural Networks , 2018, Symmetry.

[3]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[6]  Yongjun Zhang,et al.  Improved image representation and sparse representation for image classification , 2020, Applied Intelligence.

[7]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[8]  Naoto Yokoya,et al.  X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data , 2020, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[9]  Junshu Wang,et al.  LighterGAN: An Illumination Enhancement Method for Urban UAV Imagery , 2021, Remote. Sens..

[10]  Xu-Cheng Yin,et al.  Self-Adaptive Aspect Ratio Anchor for Oriented Object Detection in Remote Sensing Images , 2021, Remote. Sens..

[11]  Adam Van Etten,et al.  You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery , 2018, ArXiv.

[12]  Zhaohui Zheng,et al.  Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation , 2020, ArXiv.

[13]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[14]  Hengshuang Zhao,et al.  GridMask Data Augmentation , 2020, ArXiv.

[15]  Frédéric Jurie,et al.  Vehicle detection in aerial imagery : A small target detection benchmark , 2016, J. Vis. Commun. Image Represent..

[16]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[17]  Self-Organizing Deep Learning (SO-UNet)—A Novel Framework to Classify Urban and Peri-Urban Forests , 2021 .