A Non-Local Attention Feature Fusion Network for Multiscale Object Detection

Feature pyramid networks (FPN) provide typical architectures for building learning networks with advanced semantic features, which are essential for object recognition at different scales. However, FPN have severe shortcomings in the feature extraction and fusion stages, such as making the extracted features lack of contextual and deep semantic information. In this work, we propose a non-local channel and spatial attention feature pyramid network (NCS-FPN) to improve multi-scale learning. In the feature extraction process, contextual semantic information from different scales is collected through non-local attention networks. In the feature fusion phase, deeper feature information from both spatial and context-aware sources is aggregated to enhance multi-scale feature extraction. Extensive experiments are carried out on two public datasets, MS COCO and PASCAL VOC, are carried out and the results demonstrate that NCS-FPN achieves better performance than several SOTA methods in most cases.

[1]  Lei Zhang,et al.  NLFFTNet: A non-local feature fusion transformer network for multi-scale object detection , 2022, Neurocomputing.

[2]  Yiming Pi,et al.  Scale Expansion Pyramid Network for Cross-Scale Object Detection in Sar Images , 2021, 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS.

[3]  Lu Yuan,et al.  Dynamic Head: Unifying Object Detection Heads with Attentions , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Shengjin Wang,et al.  A2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yongquan Chen,et al.  TSingNet: Scale-aware and context-rich feature learning for traffic sign detection and recognition in the wild , 2021, Neurocomputing.

[6]  Jian Zhao,et al.  Effective Fusion Factor in FPN for Tiny Object Detection , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Jin Wang,et al.  A Cascaded R-CNN With Multiscale Attention and Imbalanced Samples for Traffic Sign Detection , 2020, IEEE Access.

[8]  Qixiang Ye,et al.  Scale Match for Tiny Person Detection , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Guanghui Wang,et al.  MDFN: Multi-Scale Deep Feature Learning Network for Object Detection , 2019, Pattern Recognit..

[10]  Shifeng Zhang,et al.  Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jiashi Feng,et al.  PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Shuzhi Sam Ge,et al.  Small traffic sign detection from large image , 2019, Applied Intelligence.

[13]  Kai Zhao,et al.  Res2Net: A New Multi-Scale Backbone Architecture , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Kai Chen,et al.  Hybrid Task Cascade for Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Christopher Zach,et al.  SPP-Net: Deep Absolute Pose Regression with Synthetic Views , 2017, ArXiv.

[17]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[18]  Yunhong Wang,et al.  Receptive Field Block Net for Accurate and Fast Object Detection , 2017, ECCV.

[19]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[23]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[24]  Serge J. Belongie,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[29]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[31]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[32]  Bo Peng,et al.  Group multi-scale attention pyramid network for traffic sign detection , 2021, Neurocomputing.

[33]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.