Decoupled Dynamic Filter Networks

Convolution is one of the basic building blocks of CNN architectures. Despite its common use, standard convolution has two main shortcomings: Content-agnostic and Computation-heavy. Dynamic filters are content-adaptive, while further increasing the computational overhead. Depth-wise convolution is a lightweight variant, but it usually leads to a drop in CNN performance or requires a larger number of channels. In this work, we propose the Decoupled Dynamic Filter (DDF) that can simultaneously tackle both of these shortcomings. Inspired by recent advances in attention, DDF decouples a depth-wise dynamic filter into spatial and channel dynamic filters. This decomposition considerably reduces the number of parameters and limits computational costs to the same level as depth-wise convolution. Meanwhile, we observe a significant boost in performance when replacing standard convolution with DDF in classification networks. ResNet50 / 101 get improved by 1.9% and 1.3% on the top-1 accuracy, while their computational costs are reduced by nearly half. Experiments on the detection and joint upsampling networks also demonstrate the superior performance of the DDF upsampling variant (DDF-Up) in comparison with standard convolution and specialized content-adaptive layers. The project page with code is available 1.

[1]  Kai Zhao,et al.  Res2Net: A New Multi-Scale Backbone Architecture , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Xiangyu Zhang,et al.  WeightNet: Revisiting the Design Space of Weight Networks , 2020, ECCV.

[3]  Duo Li,et al.  PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer , 2020, ECCV.

[4]  Qiang Wang,et al.  DyNet: Dynamic Convolution for Accelerating Convolutional Neural Networks , 2020, ArXiv.

[5]  Tao Kong,et al.  SOLOv2: Dynamic, Faster and Stronger , 2020, ArXiv.

[6]  Hao Chen,et al.  Conditional Convolutions for Instance Segmentation , 2020, ECCV.

[7]  B. S. Manjunath,et al.  VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Lu Yuan,et al.  Dynamic Convolution: Attention Over Convolution Kernels , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ales Leonardis,et al.  Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks , 2019, International Journal of Computer Vision.

[10]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Julio Zamora-Esquivel,et al.  Adaptive Convolutional Kernels , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[12]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[13]  Chen Change Loy,et al.  CARAFE: Content-Aware ReAssembly of FEatures , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Quoc V. Le,et al.  Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Hang Su,et al.  Pixel-Adaptive Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Quoc V. Le,et al.  CondConv: Conditionally Parameterized Convolutions for Efficient Inference , 2019, NeurIPS.

[18]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Narendra Ahuja,et al.  Joint Image Filtering with Deep Convolutional Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[21]  In-So Kweon,et al.  BAM: Bottleneck Attention Module , 2018, BMVC.

[22]  Rongrong Ji,et al.  Cross-Modality Person Re-Identification with Generative Adversarial Training , 2018, IJCAI.

[23]  Yu Yang,et al.  Dynamic Filtering with Large Sampling Field for ConvNets , 2018, ECCV.

[24]  In So Kweon,et al.  KAIST Multi-Spectral Day/Night Data Set for Autonomous and Assisted Driving , 2018, IEEE Transactions on Intelligent Transportation Systems.

[25]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Sheng Tang,et al.  Scale-Adaptive Convolutions for Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Jian-Huang Lai,et al.  RGB-Infrared Cross-Modality Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[31]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Xiaoou Tang,et al.  Depth Map Super-Resolution by Deep Multi-Scale Guidance , 2016, ECCV.

[35]  Narendra Ahuja,et al.  Deep Joint Image Filtering , 2016, ECCV.

[36]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[40]  Peter V. Gehler,et al.  Superpixel Convolutional Networks Using Bilateral Inceptions , 2015, ECCV.

[41]  Kate Saenko,et al.  Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.

[42]  Jonathan T. Barron,et al.  The Fast Bilateral Solver , 2015, ECCV.

[43]  Peter V. Gehler,et al.  Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jean Ponce,et al.  Robust image filtering using joint static and dynamic guidance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[48]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[49]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[50]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[53]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[55]  Michael Gleicher,et al.  Texture-Consistent Shadow Removal , 2008, ECCV.

[56]  Dani Lischinski,et al.  Joint bilateral upsampling , 2007, ACM Trans. Graph..

[57]  M. Corbetta,et al.  Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.

[58]  Ronald A. Rensink The Dynamic Representation of Scenes , 2000 .