PENet: Object Detection Using Points Estimation in High Definition Aerial Images

Aerial imagery has been increasingly adopted in mission-critical tasks, such as traffic surveillance, smart cities, and disaster assistance. However, identifying objects from aerial images faces the following challenges: 1) objects of interests are often too small and too dense relative to the images; 2) objects of interests are often in different relative sizes; and 3) the number of objects in each category is imbalanced. A novel network structure, Points Estimated Network (PENet), is proposed in this work to answer these challenges. PENet uses a Mask Resampling Module (MRM) to augment the imbalanced datasets, a coarse anchor-free detector (CPEN) to effectively predict the center points of the small object clusters, and a fine anchor-free detector FPEN to locate the precise positions of the small objects. An adaptive merge algorithm Non-maximum Merge (NMM) is implemented in CPEN to address the issue of detecting dense small objects, and a hierarchical loss is defined in FPEN to further improve the classification accuracy. Our extensive experiments on aerial datasets visDrone [1] and UAVDT [2] showed that PENet achieved higher precision results than existing state-of-the-art approaches. Our best model achieved 8.7% improvement on visDrone and 20.3% on UAVDT.

[1]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2]  Qi Tian,et al.  The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking , 2018, ECCV.

[3]  Wenyu Liu,et al.  Point Linking Network for Object Detection , 2017, ArXiv.

[4]  Kyunghyun Cho,et al.  Augmentation for small object detection , 2019, 9th International Conference on Advances in Computing and Information Technology (ACITY 2019).

[5]  Heesung Kwon,et al.  Delving Into Robust Object Detection From Unmanned Aerial Vehicles: A Deep Nuisance Disentanglement Approach , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Qinghua Hu,et al.  Vision Meets Drones: A Challenge , 2018, ArXiv.

[8]  Fahad Shahbaz Khan,et al.  Learning Rich Features at High-Speed for Single-Shot Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[12]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Rachel Huang,et al.  YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[14]  Xiangyu Zhang,et al.  DetNet: A Backbone network for Object Detection , 2018, ArXiv.

[15]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[16]  Mubarak Shah,et al.  ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Erik Blasch,et al.  Clustered Object Detection in Aerial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Farid Melgani,et al.  Detecting Cars in UAV Images With a Catalog-Based Approach , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Lars Petersson,et al.  DeNet: Scalable Real-Time Object Detection with Directed Sparse Sampling , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[22]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[23]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[24]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  F. Ozge Unel,et al.  The Power of Tiling for Small Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Larry S. Davis,et al.  Dynamic Zoom-in Network for Fast Object Detection in Large Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Xiangyu Zhang,et al.  Light-Head R-CNN: In Defense of Two-Stage Object Detector , 2017, ArXiv.

[28]  Dongyu Zhang,et al.  How to Fully Exploit The Abilities of Aerial Image Detectors , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[29]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[30]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[31]  Xin Sun,et al.  RRNet: A Hybrid Detector for Object Detection in Drone-Captured Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).