Why Convolutional Networks Learn Oriented Bandpass Filters: A Hypothesis

It has been repeatedly observed that convolutional architectures when applied to image understanding tasks learn oriented bandpass filters. A standard explanation of this result is that these filters reflect the structure of the images that they have been exposed to during training: Natural images typically are locally composed of oriented contours at various scales and oriented bandpass filters are matched to such structure. The present paper offers an alternative explanation based not on the structure of images, but rather on the structure of convolutional architectures. In particular, complex exponentials are the eigenfunctions of convolution. These eigenfunctions are defined globally; however, convolutional architectures operate locally. To enforce locality, one can apply a windowing function to the eigenfunctions, which leads to oriented bandpass filters as the natural operators to be learned with convolutional architectures. From a representational point of view, these filters allow for a local systematic way to characterize and operate on an image or other signal.

[1]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[2]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[3]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[4]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yizhou Yu,et al.  Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[8]  Andrew Zisserman,et al.  What have We Learned from Deep Representations for Action Recognition? , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  W. Kilmer A Friendly Guide To Wavelets , 1998, Proceedings of the IEEE.

[10]  Michael S. Lewicki,et al.  Emergence of complex cell properties by learning to generalize in natural scenes , 2009, Nature.

[11]  T. Hughes,et al.  Signals and systems , 2006, Genome Biology.

[12]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[13]  P. O. Bishop,et al.  Spatial vision. , 1971, Annual review of psychology.

[14]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  R. Linsker,et al.  From basic network principles to neural architecture , 1986 .

[19]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[20]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[21]  Honglak Lee,et al.  Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units , 2016, ICML.

[22]  Trevor. Darrell,et al.  Computer Vision and Applications , 2000 .

[23]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Mark Tygert,et al.  A Mathematical Motivation for Complex-Valued Convolutional Networks , 2015, Neural Computation.

[25]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[26]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[27]  Arnold W. M. Smeulders,et al.  Structured Receptive Fields in CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Richard P. Wildes,et al.  A Spatiotemporal Oriented Energy Network for Dynamic Texture Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Jorge Herbert de Lira,et al.  Two-Dimensional Signal and Image Processing , 1989 .

[30]  Ronald N. Bracewell,et al.  The Fourier Transform and Its Applications , 1966 .

[31]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.

[32]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[33]  Prashant Parikh A Theory of Communication , 2010 .