Frequency learning for structured CNN filters with Gaussian fractional derivatives

Frequency information lies at the base of discriminating between textures, and therefore between different objects. Classical CNN architectures limit the frequency learning through fixed filter sizes, and lack a way of explicitly controlling it. Here, we build on the structured receptive field filters with Gaussian derivative basis. Yet, rather than using predetermined derivative orders, which typically result in fixed frequency responses for the basis functions, we learn these. We show that by learning the order of the basis we can accurately learn the frequency of the filters, and hence adapt to the optimal frequencies for the underlying learning task. We investigate the well-founded mathematical formulation of fractional derivatives to adapt the filter frequencies during training. Our formulation leads to parameter savings and data efficiency when compared to the standard CNNs and the Gaussian derivative CNN filter networks that we build upon.

[1]  Kai Xu,et al.  Learning in the Frequency Domain , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Nick G. Kingsbury,et al.  Efficient Convolutional Network Learning Using Parametric Log Based Dual-Tree Wavelet ScatterNet , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[3]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[4]  M. Caputo,et al.  A new Definition of Fractional Derivative without Singular Kernel , 2015 .

[5]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[6]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[7]  Ekin D. Cubuk,et al.  A Fourier Perspective on Model Robustness in Computer Vision , 2019, NeurIPS.

[8]  Yoshua Bengio,et al.  On the Spectral Bias of Neural Networks , 2018, ICML.

[9]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[10]  Edward H. Adelson,et al.  Shiftable multiscale transforms , 1992, IEEE Trans. Inf. Theory.

[11]  Tony Lindeberg,et al.  Scale-covariant and scale-invariant Gaussian derivative networks , 2020, SSVM.

[12]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Luc Van Gool,et al.  Learning Filter Basis for Convolutional Neural Network Compression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Bernard Ghanem,et al.  Gabor Layers Enhance Network Robustness , 2019, ECCV.

[15]  Marco Loog,et al.  Resolution Learning in Deep Convolutional Networks Using Scale-Space Theory , 2021, IEEE Transactions on Image Processing.

[16]  Stéphane Mallat,et al.  Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Denis F. Wolf,et al.  Image classification in frequency domain with 2SReLU: a second harmonics superposition activation function , 2020, ArXiv.

[18]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[19]  Jeremy Howard,et al.  fastai: A Layered API for Deep Learning , 2020, Inf..

[20]  Sergey Zagoruyko,et al.  Scaling the Scattering Transform: Deep Hybrid Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Shai Avidan,et al.  Rethinking FUN: Frequency-Domain Utilization Networks , 2020, ArXiv.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[24]  Eric P. Xing,et al.  High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kurt Keutzer,et al.  SqueezeNext: Hardware-Aware Neural Network Design , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Nergis Tomen,et al.  Spectral Leakage and Rethinking the Kernel Size in CNNs , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[31]  Jean-Bernard Martens,et al.  The Hermite transform-theory , 1990, IEEE Trans. Acoust. Speech Signal Process..

[32]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[33]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[34]  Arnold W. M. Smeulders,et al.  Structured Receptive Fields in CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[36]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Charless C. Fowlkes,et al.  Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[38]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[39]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[40]  Chen Chen,et al.  Gabor Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[42]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[43]  Nergis Tomen,et al.  Deep Continuous Networks , 2024, ICML.

[44]  Nick G. Kingsbury,et al.  Visualizing and improving scattering networks , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[45]  S. Mallat A wavelet tour of signal processing , 1998 .

[46]  Ivan Sosnovik,et al.  Scale-Equivariant Steerable Networks , 2020, ICLR.

[47]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .