论文信息 - ACFNet: Attentional Class Feature Network for Semantic Segmentation

ACFNet: Attentional Class Feature Network for Semantic Segmentation

Recent works have made great progress in semantic segmentation by exploiting richer context, most of which are designed from a spatial perspective. In contrast to previous works, we present the concept of class center which extracts the global context from a categorical perspective. This class-level context describes the overall representation of each class in an image. We further propose a novel module, named Attentional Class Feature (ACF) module, to calculate and adaptively combine different class centers according to each pixel. Based on the ACF module, we introduce a coarse-to-fine segmentation network, called Attentional Class Feature Network (ACFNet), which can be composed of an ACF module and any off-the-shell segmentation network (base network). In this paper, we use two types of base networks to evaluate the effectiveness of ACFNet. We achieve new state-of-the-art performance of 81.85% mIoU on Cityscapes dataset with only finely annotated data used for training.

[1] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3] Yunchao Wei,et al. CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4] Gang Wang,et al. Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6] H. Robbins. A Stochastic Approximation Method , 1951 .

[7] Alan L. Yuille,et al. A 3D Coarse-to-Fine Framework for Automatic Pancreas Segmentation , 2017, ArXiv.

[8] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[9] Kun Yu,et al. DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10] Xiangyu Zhang,et al. Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Gang Wang,et al. Scene Segmentation with DAG-Recurrent Neural Networks , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Wei Liu,et al. ParseNet: Looking Wider to See Better , 2015, ArXiv.

[13] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[14] Dan Wang,et al. A novel coarse-to-fine hair segmentation method , 2011, Face and Gesture 2011.

[15] Marcus Liwicki,et al. Scene labeling with LSTM recurrent neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[17] Jingdong Wang,et al. OCNet: Object Context Network for Scene Parsing , 2018, ArXiv.

[18] Thomas Brox,et al. High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Jitendra Malik,et al. ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Cheng Li,et al. Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Gang Yu,et al. BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[23] Garrison W. Cottrell,et al. Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[24] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Anton van den Hengel,et al. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition , 2016, Pattern Recognit..

[26] Xiaogang Wang,et al. Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Yang Wang,et al. Label Refinement Network for Coarse-to-Fine Semantic Segmentation , 2017, ArXiv.

[31] Xiaoxiao Li,et al. Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Jun Fu,et al. Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[36] Iasonas Kokkinos,et al. Dense and Low-Rank Gaussian CRFs Using Deep Embeddings , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[38] Gang Yu,et al. Learning a Discriminative Feature Network for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Yali Amit,et al. A coarse-to-fine strategy for multiclass shape detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40] Konstantinos Kamnitsas,et al. Autofocus Layer for Semantic Segmentation , 2018, MICCAI.

[41] Yi Zhang,et al. PSANet: Point-wise Spatial Attention Network for Scene Parsing , 2018, ECCV.

[42] Lorenzo Porzi,et al. In-place Activated BatchNorm for Memory-Optimized Training of DNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Ian D. Reid,et al. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] Anton van den Hengel,et al. High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks , 2016, ArXiv.

[46] Donald Geman,et al. Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[47] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[48] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.