Transformed Dynamic Feature Pyramid for Small Object Detection

The low resolution and less feature information of small targets make it difficult to recognize and locate, which greatly hinders the improvement of object detection accuracy. In this paper, an object detection model (TDFP) based on CNN and transformer was established, which combines local and global context to establish the connection between features. In the proposed transformed dynamic feature pyramid network, a transformer module was designed to dynamically transform and fuse the multi-scale features generated by the backbone to generate a transformed feature pyramid with richer multi-scale features and context information. In this transformation process, gate block is used to dynamically select single-scale transformation or cross-scale transformation to achieve an optimal style of transformation and fusion of multi-scale features. The experimental results show that the model improves the small targets detection accuracy based on CNN and transformer. Based on the backbone ResNeXt-101, TDFP achieves 46.2% AP and 26.3% APS on MS COCO, and takes the amount of computation as a loss constraint to achieve a better balance between detection accuracy and computational complexity.