Spectral Leakage and Rethinking the Kernel Size in CNNs

Convolutional layers in CNNs implement linear filters which decompose the input into different frequency bands. However, most modern architectures neglect standard principles of filter design when optimizing their model choices regarding the size and shape of the convolutional kernel. In this work, we consider the well-known problem of spectral leakage caused by windowing artifacts in filtering operations in the context of CNNs. We show that the small size of CNN kernels make them susceptible to spectral leakage, which may induce performance-degrading artifacts. To address this issue, we propose the use of larger kernel sizes along with the Hamming window function to alleviate leakage in CNN architectures. We demonstrate improved classification accuracy over baselines with conventional 3 × 3 kernels, on multiple benchmark datasets including FashionMNIST, CIFAR-10, CIFAR-100 and ImageNet, via the simple use of a standard window function in convolutional layers. Finally, we show that CNNs employing the Hamming window display increased robustness against certain types of adversarial attacks. Code available at1.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[3]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[4]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[5]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Yu Bai,et al.  Towards Understanding Hierarchical Learning: Benefits of Neural Representations , 2020, NeurIPS.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[11]  Trevor Darrell,et al.  Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields , 2019, ArXiv.

[12]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[13]  Stéphane Mallat,et al.  Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Stéphane Mallat,et al.  Deep roto-translation scattering for object classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16]  Qianli Liao,et al.  Hierarchically Local Tasks and Deep Convolutional Networks , 2020, ArXiv.

[17]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[18]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[19]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[20]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[21]  Martin Wistuba,et al.  Adversarial Robustness Toolbox v1.0.0 , 2018, 1807.01069.

[22]  Nikos Komodakis,et al.  Scattering Networks for Hybrid Representation Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  K.M.M. Prabhu,et al.  Window Functions and Their Applications in Signal Processing , 2013 .

[24]  John L. Semmlow,et al.  Biosignal and Medical Image Processing , 2004 .

[25]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[26]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[27]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Arnold W. M. Smeulders,et al.  Structured Receptive Fields in CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[31]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[32]  Jan C. van Gemert,et al.  On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location , 2020, CVPR.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[35]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[36]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[37]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[38]  Yuanzhi Li,et al.  Backward Feature Correction: How Deep Learning Performs Deep Learning , 2020, ArXiv.

[39]  P. Lafrance,et al.  Digital filters , 1974, Proceedings of the IEEE.

[40]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[42]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[44]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[45]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[46]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[47]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[48]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[49]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[50]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[51]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).