论文信息 - Rethinking Learnable Tree Filter for Generic Feature Transform

Rethinking Learnable Tree Filter for Generic Feature Transform

The Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation. Nevertheless, the intrinsic geometric constraint forces it to focus on the regions with close spatial distance, hindering the effective long-range interactions. To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term. Besides, we propose a learnable spanning tree algorithm to replace the original non-differentiable one, which further improves the flexibility and robustness. With the above improvements, our method can better capture long-range dependencies and preserve structural details with linear complexity, which is extended to several vision tasks for more generic feature transform. Extensive experiments on object detection/instance segmentation demonstrate the consistent improvements over the original version. For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles. Code is available at https://github.com/StevenGrove/LearnableTreeFilterV2.

[1] Bernard Harris,et al. Graph theory and its applications , 1970 .

[2] Yunchao Wei,et al. CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Nuno Vasconcelos,et al. Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[8] J. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[9] Gang Sun,et al. Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks , 2018, NeurIPS.

[10] Kun Yu,et al. DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[12] Gang Yu,et al. Learning a Discriminative Feature Network for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[14] Razvan Pascanu,et al. Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[15] Martin Simonovsky,et al. Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] Jiashi Feng,et al. Strip Pooling: Rethinking Spatial Pooling for Scene Parsing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[18] Stan Z. Li,et al. Markov Random Field Models in Computer Vision , 1994, ECCV.

[19] Jian Sun,et al. Learnable Tree Filter for Structure-preserving Feature Transform , 2019, NeurIPS.

[20] Changxin Gao,et al. GLNet: Global Local Network for Weakly Supervised Action Localization , 2020, IEEE Transactions on Multimedia.

[21] Jun Fu,et al. Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Stefano Mattoccia,et al. Beyond Local Reasoning for Stereo Confidence Estimation with Deep Learning , 2018, ECCV.

[25] Gang Yu,et al. BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[26] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[27] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[28] Qingxiong Yang,et al. Stereo Matching Using Tree Filtering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Jian Sun,et al. Fine-Grained Dynamic Head for Object Detection , 2020, NeurIPS.

[30] Xuming He,et al. LatentGNN: Learning Efficient Non-local Relations for Visual Recognition , 2019, ICML.

[31] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32] Yi Zhang,et al. PSANet: Point-wise Spatial Attention Network for Scene Parsing , 2018, ECCV.

[33] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34] Nanning Zheng,et al. Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[35] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[36] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] David Eppstein,et al. Spanning Trees and Spanners , 2000, Handbook of Computational Geometry.

[39] Xiaoxiao Li,et al. Deep Learning Markov Random Field for Semantic Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40] Kuk-Jin Yoon,et al. Leveraging stereo matching with learning-based confidence measures , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] W. Freeman,et al. Generalized Belief Propagation , 2000, NIPS.

[42] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Gang Yu,et al. TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Stefano Mattoccia,et al. Quantitative Evaluation of Confidence Measures in a Machine Learning World , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Xiangyu Zhang,et al. Learning Dynamic Routing for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49] Daniel P. Huttenlocher,et al. Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[50] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[51] Olga Veksler,et al. Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[52] Raquel Urtasun,et al. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[53] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Stephen Lin,et al. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).