论文信息 - Decoupled Dynamic Filter Networks

Decoupled Dynamic Filter Networks

Convolution is one of the basic building blocks of CNN architectures. Despite its common use, standard convolution has two main shortcomings: Content-agnostic and Computation-heavy. Dynamic filters are content-adaptive, while further increasing the computational overhead. Depth-wise convolution is a lightweight variant, but it usually leads to a drop in CNN performance or requires a larger number of channels. In this work, we propose the Decoupled Dynamic Filter (DDF) that can simultaneously tackle both of these shortcomings. Inspired by recent advances in attention, DDF decouples a depth-wise dynamic filter into spatial and channel dynamic filters. This decomposition considerably reduces the number of parameters and limits computational costs to the same level as depth-wise convolution. Meanwhile, we observe a significant boost in performance when replacing standard convolution with DDF in classification networks. ResNet50 / 101 get improved by 1.9% and 1.3% on the top-1 accuracy, while their computational costs are reduced by nearly half. Experiments on the detection and joint upsampling networks also demonstrate the superior performance of the DDF upsampling variant (DDF-Up) in comparison with standard convolution and specialized content-adaptive layers. The project page with code is available 1.

[1] Kai Zhao,et al. Res2Net: A New Multi-Scale Backbone Architecture , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Xiangyu Zhang,et al. WeightNet: Revisiting the Design Space of Weight Networks , 2020, ECCV.

[3] Duo Li,et al. PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer , 2020, ECCV.

[4] Qiang Wang,et al. DyNet: Dynamic Convolution for Accelerating Convolutional Neural Networks , 2020, ArXiv.

[5] Tao Kong,et al. SOLOv2: Dynamic, Faster and Stronger , 2020, ArXiv.

[6] Hao Chen,et al. Conditional Convolutions for Instance Segmentation , 2020, ECCV.

[7] B. S. Manjunath,et al. VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Lu Yuan,et al. Dynamic Convolution: Attention Over Convolution Kernels , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Ales Leonardis,et al. Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks , 2019, International Journal of Computer Vision.

[10] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Julio Zamora-Esquivel,et al. Adaptive Convolutional Kernels , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[12] Kai Chen,et al. MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[13] Chen Change Loy,et al. CARAFE: Content-Aware ReAssembly of FEatures , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14] Quoc V. Le,et al. Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15] Quoc V. Le,et al. Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16] Hang Su,et al. Pixel-Adaptive Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Quoc V. Le,et al. CondConv: Conditionally Parameterized Convolutions for Efficient Inference , 2019, NeurIPS.

[18] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Narendra Ahuja,et al. Joint Image Filtering with Deep Convolutional Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.

[21] In-So Kweon,et al. BAM: Bottleneck Attention Module , 2018, BMVC.

[22] Rongrong Ji,et al. Cross-Modality Person Re-Identification with Generative Adversarial Training , 2018, IJCAI.

[23] Yu Yang,et al. Dynamic Filtering with Large Sampling Field for ConvNets , 2018, ECCV.

[24] In So Kweon,et al. KAIST Multi-Spectral Day/Night Data Set for Autonomous and Assisted Driving , 2018, IEEE Transactions on Intelligent Transportation Systems.

[25] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Sheng Tang,et al. Scale-Adaptive Convolutions for Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28] Jian-Huang Lai,et al. RGB-Infrared Cross-Modality Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[31] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Xiaoou Tang,et al. Depth Map Super-Resolution by Deep Multi-Scale Guidance , 2016, ECCV.

[35] Narendra Ahuja,et al. Deep Joint Image Filtering , 2016, ECCV.

[36] Daniel Rueckert,et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Luc Van Gool,et al. Dynamic Filter Networks , 2016, NIPS.

[38] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[40] Peter V. Gehler,et al. Superpixel Convolutional Networks Using Bilateral Inceptions , 2015, ECCV.

[41] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.

[42] Jonathan T. Barron,et al. The Fast Bilateral Solver , 2015, ECCV.

[43] Peter V. Gehler,et al. Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Jean Ponce,et al. Robust image filtering using joint static and dynamic guidance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] Seunghoon Hong,et al. Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[48] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[49] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[50] Jian Sun,et al. Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[53] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[54] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[55] Michael Gleicher,et al. Texture-Consistent Shadow Removal , 2008, ECCV.

[56] Dani Lischinski,et al. Joint bilateral upsampling , 2007, ACM Trans. Graph..

[57] M. Corbetta,et al. Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.

[58] Ronald A. Rensink. The Dynamic Representation of Scenes , 2000 .