Drone-based object detection has been widely applied in ground object surveillance, urban patrol, and some other fields. However, the dramatic scale changes and complex backgrounds of drone images usually result in weak feature representation of small objects, which makes it challenging to achieve high-precision object detection. Aiming to improve small objects detection, this paper proposes a novel cross-scale knowledge distillation (CSKD) method, which enhances the features of small objects in a manner similar to image enlargement, so it is termed as ZoomInNet. First, based on an efficient feature pyramid network structure, the teacher and student network are trained with images in different scales to introduce the cross-scale feature. Then, the proposed layer adaption (LA) and feature level alignment (FA) mechanisms are applied to align the feature size of the two models. After that, the adaptive key distillation point (AKDP) algorithm is used to get the crucial positions in feature maps that need knowledge distillation. Finally, the position-aware L2 loss is used to measure the difference between feature maps from cross-scale models, realizing the cross-scale information compression in a single model. Experiments on the challenging Visdrone2018 dataset show that the proposed method draws on the advantages of the image pyramid methods, while avoids the large calculation of them and significantly improves the detection accuracy of small objects. Simultaneously, the comparison with mainstream methods proves that our method has the best performance in small object detection.
[1]
Geoffrey E. Hinton,et al.
Distilling the Knowledge in a Neural Network
,
2015,
ArXiv.
[2]
Ali Farhadi,et al.
YOLOv3: An Incremental Improvement
,
2018,
ArXiv.
[3]
Larry S. Davis,et al.
SNIPER: Efficient Multi-Scale Training
,
2018,
NeurIPS.
[4]
Kai Chen,et al.
MMDetection: Open MMLab Detection Toolbox and Benchmark
,
2019,
ArXiv.
[5]
Bo Chen,et al.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
,
2017,
ArXiv.
[6]
Xingyi Zhou,et al.
Objects as Points
,
2019,
ArXiv.
[7]
Gaurav Singhal,et al.
A comparision between satellite based and drone based remote sensing technology to achieve sustainable development: a review
,
2017
.
[8]
Zhou Huang,et al.
Multi-level Cross-modal Interaction Network for RGB-D Salient Object Detection
,
2020,
Neurocomputing.
[9]
Zhenghua Xu,et al.
Segmentation is All You Need
,
2019,
ArXiv.