论文信息 - Fine-Grained Dynamic Head for Object Detection

Fine-Grained Dynamic Head for Object Detection

The Feature Pyramid Network (FPN) presents a remarkable approach to alleviate the scale variance in object representation by performing instance-level assignments. Nevertheless, this strategy ignores the distinct characteristics of different sub-regions in an instance. To this end, we propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance, which further releases the ability of multi-scale feature representation. Moreover, we design a spatial gate with the new activation function to reduce computational complexity dramatically through spatially sparse convolutions. Extensive experiments demonstrate the effectiveness and efficiency of the proposed method on several state-of-the-art detection benchmarks. Code is available at this https URL.

[1] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Jitendra Malik,et al. Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Gang Yu,et al. TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Kaiming He,et al. Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Hao Chen,et al. FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Shu Liu,et al. Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9] Hang Xu,et al. Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11] Kai Chen,et al. Feature Pyramid Grids , 2020, ArXiv.

[12] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13] Quoc V. Le,et al. EfficientDet: Scalable and Efficient Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Changxin Gao,et al. GLNet: Global Local Network for Weakly Supervised Action Localization , 2020, IEEE Transactions on Multimedia.

[15] Le Yang,et al. Resolution Adaptive Networks for Efficient Inference , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[18] Quoc V. Le,et al. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Zhiru Zhang,et al. Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel Gating , 2019, MICRO.

[20] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[21] Jian Sun,et al. Learnable Tree Filter for Structure-preserving Feature Transform , 2019, NeurIPS.

[22] Naiyan Wang,et al. Data-Driven Sparse Structure Selection for Deep Neural Networks , 2017, ECCV.

[23] Qi Tian,et al. Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24] Jonathan T. Barron,et al. Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Jian Sun,et al. DetNAS: Backbone Search for Object Detection , 2019, NeurIPS.

[26] Shifeng Zhang,et al. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[28] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[30] Gong Cheng,et al. P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Trevor Darrell,et al. Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[32] Xin Wang,et al. SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[33] Hugo Larochelle,et al. Dynamic Capacity Networks , 2015, ICML.

[34] Shu Liu,et al. Sequential Context Encoding for Duplicate Removal , 2018, NeurIPS.

[35] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[36] Kaiming He,et al. Group Normalization , 2018, ECCV.

[37] Benjamin Graham,et al. Spatially-sparse convolutional neural networks , 2014, ArXiv.

[38] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Houqiang Li,et al. Feature Selective Networks for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41] Rongrong Ji,et al. FreeAnchor: Learning to Match Anchors for Visual Object Detection , 2019, NeurIPS.

[42] Xiangyu Zhang,et al. Learning Dynamic Routing for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Misha Denil,et al. Predicting Parameters in Deep Learning , 2014 .

[44] Qiang Chen,et al. Network In Network , 2013, ICLR.

[45] Kaiming He,et al. PointRend: Image Segmentation As Rendering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Jian Sun,et al. Rethinking Learnable Tree Filter for Generic Feature Transform , 2020, NeurIPS.

[47] Stephen Lin,et al. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[48] Xuming He,et al. LatentGNN: Learning Efficient Non-local Relations for Visual Recognition , 2019, ICML.