Resolution Learning in Deep Convolutional Networks Using Scale-Space Theory

Resolution in deep convolutional neural networks (CNNs) is typically bounded by the receptive field size through filter sizes, and subsampling layers or strided convolutions on feature maps. The optimal resolution may vary significantly depending on the dataset. Modern CNNs hard-code their resolution hyper-parameters in the network architecture which makes tuning such hyper-parameters cumbersome. We propose to do away with hard-coded resolution hyper-parameters and aim to learn the appropriate resolution from data. We use scale-space theory to obtain a self-similar parametrization of filters and make use of the N-Jet: a truncated Taylor series to approximate a filter by a learned combination of Gaussian derivative filters. The parameter <inline-formula> <tex-math notation="LaTeX">$\sigma $ </tex-math></inline-formula> of the Gaussian basis controls both the amount of detail the filter encodes and the spatial extent of the filter. Since <inline-formula> <tex-math notation="LaTeX">$\sigma $ </tex-math></inline-formula> is a continuous parameter, we can optimize it with respect to the loss. The proposed N-Jet layer achieves comparable performance when used in state-of-the art architectures, while learning the correct resolution in each layer automatically. We evaluate our N-Jet layer on both classification and segmentation, and we show that learning <inline-formula> <tex-math notation="LaTeX">$\sigma $ </tex-math></inline-formula> is especially beneficial when dealing with inputs at multiple sizes.

[1]  Yangdong Ye,et al.  Rank-based pooling for deep convolutional neural networks , 2016, Neural Networks.

[2]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[3]  Nick G. Kingsbury,et al.  Visualizing and improving scattering networks , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[4]  Michal Irani,et al.  From Discrete to Continuous Convolution Layers , 2020, ArXiv.

[5]  Trevor Darrell,et al.  Dynamic Scale Inference by Entropy Minimization , 2019, ArXiv.

[6]  S. Mallat A wavelet tour of signal processing , 1998 .

[7]  Yu Liu,et al.  Recurrent Scale Approximation for Object Detection in CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Jia Xu,et al.  Fast Image Processing with Fully-Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[10]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[12]  Zhuowen Tu,et al.  Generalizing Pooling Functions in CNNs: Mixed, Gated, and Tree , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yi Li,et al.  Data-Driven Neuron Allocation for Scale Aggregation Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  福島 邦彦 A Neural Network Model for Selective Attention in Visual Pattern Recognition , 1987 .

[15]  Vladlen Koltun,et al.  Multiscale Deep Equilibrium Models , 2020, NeurIPS.

[16]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Max A. Viergever,et al.  The Gaussian scale-space paradigm and the multiscale local jet , 1996, International Journal of Computer Vision.

[18]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[20]  Edward H. Adelson,et al.  Shiftable multiscale transforms , 1992, IEEE Trans. Inf. Theory.

[21]  Luc Van Gool,et al.  Learning Filter Basis for Convolutional Neural Network Compression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Stella X. Yu,et al.  Multigrid Neural Architectures , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[25]  Petros Daras,et al.  Non-linear Convolution Filters for CNN-Based Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  R A Young,et al.  The Gaussian derivative model for spatial vision: I. Retinal mechanisms. , 1988, Spatial vision.

[28]  Iasonas Kokkinos,et al.  Modeling local and global deformations in Deep Learning: Epitomic convolution, Multiple Instance Learning, and sliding window detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[30]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[31]  Li Fei-Fei,et al.  Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Xinjiang Wang,et al.  Scale-Equalizing Pyramid Convolution for Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[34]  Takayuki Okatani,et al.  Design of Kernels in Convolutional Neural Networks for Image Classification , 2016, ECCV.

[35]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Cordelia Schmid,et al.  Beyond the Camera: Neural Networks in World Coordinates , 2020, ArXiv.

[38]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[39]  Fei Yang,et al.  Efficient Segmentation: Learning Downsampling Near Semantic Boundaries , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  Guo-Jun Qi,et al.  Hierarchically Gated Deep Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[44]  Arnold W. M. Smeulders,et al.  Structured Receptive Fields in CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Sergey Zagoruyko,et al.  Scaling the Scattering Transform: Deep Hybrid Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Eirikur Agustsson,et al.  Scale-Space Flow for End-to-End Optimized Video Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Andrew P. Witkin,et al.  Uniqueness of the Gaussian Kernel for Scale-Space Filtering , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[49]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[50]  Marco Loog,et al.  Supervised Scale-Invariant Segmentation (and Detection) , 2011, SSVM.

[51]  Luc Florack,et al.  On the Axioms of Scale Space Theory , 2004, Journal of Mathematical Imaging and Vision.

[52]  Yu Cheng,et al.  S3Pool: Pooling with Stochastic Spatial Sampling , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Yuan Yuan,et al.  Variational Context-Deformable ConvNets for Indoor Scene Parsing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Peter V. Gehler,et al.  Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Kunihiko Fukushima,et al.  A neural network model for selective attention in visual pattern recognition , 1986, Biological Cybernetics.

[56]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[57]  J. Koenderink The structure of images , 2004, Biological Cybernetics.

[58]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[59]  Chen Chen,et al.  Gabor Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[60]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[61]  Nick G. Kingsbury,et al.  Efficient Convolutional Network Learning Using Parametric Log Based Dual-Tree Wavelet ScatterNet , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[62]  Tie-Yan Liu,et al.  Invertible Image Rescaling , 2020, ECCV.

[63]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[64]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[65]  Alan L. Yuille,et al.  Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net , 2015, ECCV.

[66]  Xiangyu Zhang,et al.  Learning Dynamic Routing for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Xiaolin Hu,et al.  Scale-Aware Face Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[69]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[72]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[73]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  David W. Jacobs,et al.  Locally Scale-Invariant Convolutional Neural Networks , 2014, ArXiv.

[75]  Trevor Darrell,et al.  Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields , 2019, ArXiv.

[76]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[77]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[78]  Jasper Snoek,et al.  Spectral Representations for Convolutional Neural Networks , 2015, NIPS.

[79]  Jean-Bernard Martens,et al.  The Hermite transform-theory , 1990, IEEE Trans. Acoust. Speech Signal Process..

[80]  Devis Tuia,et al.  Scale equivariance in CNNs with vector fields , 2018, ArXiv.

[81]  Marco Loog,et al.  Scale selection for supervised image segmentation , 2012, Image Vis. Comput..

[82]  Max A. Viergever,et al.  Scale and the differential structure of images , 1992, Image Vis. Comput..

[83]  Stefano Ermon,et al.  Learning When and Where to Zoom With Deep Reinforcement Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[85]  Xiang Li,et al.  ASCNET: Adaptive-Scale Convolutional Neural Networks for Multi-Scale Feature Learning , 2020, 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI).

[86]  Gustavo Carneiro,et al.  A deep convolutional neural network module that promotes competition of multiple-size filters , 2017, Pattern Recognit..

[87]  Chen Chen,et al.  MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution , 2019, ECCV.

[88]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Li Fang,et al.  IPG-Net: Image Pyramid Guidance Network for Small Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[90]  Charless C. Fowlkes,et al.  Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[91]  Tony Lindeberg,et al.  Scale-covariant and scale-invariant Gaussian derivative networks , 2020, SSVM.

[92]  Stéphane Mallat,et al.  Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[93]  Junmo Kim,et al.  Active Convolution: Learning the Shape of Convolution for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).