Energy Propagation in Deep Convolutional Neural Networks

Many practical machine learning tasks employ very deep convolutional neural networks. Such large depths pose formidable computational challenges in training and operating the network. It is therefore important to understand how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers. In addition, it is desirable that the feature extractor generated by the network be informative in the sense of the only signal mapping to the all-zeros feature vector being the zero input signal. This “trivial null-set” property can be accomplished by asking for “energy conservation” in the sense of the energy in the feature vector being proportional to that of the corresponding input signal. This paper establishes conditions for energy conservation (and thus for a trivial null-set) for a wide class of deep convolutional neural network-based feature extractors and characterizes corresponding feature map energy decay rates. Specifically, we consider general scattering networks employing the modulus non-linearity and we find that under mild analyticity and high-pass conditions on the filters (which encompass, inter alia, various constructions of Weyl-Heisenberg filters, wavelets, ridgelets, ( $\alpha $ )-curvelets, and shearlets) the feature map energy decays at least polynomially fast. For broad families of wavelets and Weyl-Heisenberg filters, the guaranteed decay rate is shown to be exponential. Moreover, we provide handy estimates of the number of layers needed to have at least $((1-\varepsilon )\cdot 100)\%$ of the input signal energy be contained in the feature vector.

[1]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[2]  Gitta Kutyniok,et al.  Parabolic Molecules , 2012, Found. Comput. Math..

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Helmut Bölcskei,et al.  Discrete Deep Feature Extraction: A Theory and New Architectures , 2016, ICML.

[5]  Pierre Vandergheynst,et al.  Directional dyadic wavelet transforms: design and algorithms , 2002, IEEE Trans. Image Process..

[6]  P. Grohs,et al.  Cartoon Approximation with -Curvelets , 2014 .

[7]  Gerald Kaiser,et al.  A Friendly Guide to Wavelets , 1994 .

[8]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[9]  D. Donoho Sparse Components of Images and Optimal Atomic Decompositions , 2001 .

[10]  O. Christensen An introduction to frames and Riesz bases , 2002 .

[11]  Helmut Bölcskei,et al.  Topology reduction in deep convolutional feature extraction networks , 2017, Optical Engineering + Applications.

[12]  B. Dacorogna Introduction to the calculus of variations , 2004 .

[13]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[14]  Irène Waldspurger Wavelet transform modulus : phase retrieval and scattering , 2017 .

[15]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Gitta Kutyniok,et al.  Introduction to Shearlets , 2012 .

[17]  Joakim Andén,et al.  Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.

[18]  Thomas Wiatowski,et al.  A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction , 2015, IEEE Transactions on Information Theory.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Syed Twareque Ali,et al.  Continuous Frames in Hilbert Space , 1993 .

[21]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[22]  W. Czaja,et al.  Analysis of time-frequency scattering transforms , 2016, Applied and Computational Harmonic Analysis.

[23]  Winfried Sickel,et al.  Sobolev spaces of fractional order, Nemytskij operators, and nonlinear partial differential equations , 1996, de Gruyter series in nonlinear analysis and applications.

[24]  G. Burton Sobolev Spaces , 2013 .

[25]  E. Candès,et al.  Continuous curvelet transform: II. Discretization and frames , 2005 .

[26]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[27]  W. Marsden I and J , 2012 .

[28]  Karlheinz Gröchenig,et al.  Foundations of Time-Frequency Analysis , 2000, Applied and numerical harmonic analysis.

[29]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[31]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Karlheinz Gröchenig,et al.  Note on B-splines, wavelet scaling functions, and Gabor frames , 2003, IEEE Trans. Inf. Theory.

[33]  G. Weiss,et al.  Littlewood-Paley Theory and the Study of Function Spaces , 1991 .

[34]  H. Weyl On the Volume of Tubes , 1939 .

[35]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[36]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[37]  Irène Waldspurger,et al.  Exponential decay of scattering coefficients , 2016, 2017 International Conference on Sampling Theory and Applications (SampTA).

[38]  E. Candès,et al.  New tight frames of curvelets and optimal representations of objects with piecewise C2 singularities , 2004 .

[39]  Stéphane Mallat,et al.  Rigid-Motion Scattering for Texture Classification , 2014, ArXiv.

[40]  P. Grohs Ridgelet-type Frame Decompositions for Sobolev Spaces related to Linear Transport , 2012 .

[41]  Loukas Grafakos,et al.  Modern Fourier Analysis , 2008 .

[42]  Luca Benini,et al.  Deep structured features for semantic segmentation , 2016, 2017 25th European Signal Processing Conference (EUSIPCO).

[43]  Gian Marti,et al.  Heart sound classification using deep structured features , 2016, 2016 Computing in Cardiology Conference (CinC).

[44]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[45]  D. Labate,et al.  Sparse Multidimensional Representations using Anisotropic Dilation and Shear Operators , 2006 .

[46]  I. Holopainen Riemannian Geometry , 1927, Nature.

[47]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[48]  Holger Wendland,et al.  Scattered Data Approximation: Conditionally positive definite functions , 2004 .

[49]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[50]  Gitta Kutyniok,et al.  Shearlets: Multiscale Analysis for Multivariate Data , 2012 .

[51]  Maneesh Kumar Singh,et al.  Lipschitz Properties for Deep Convolutional Networks , 2017, ArXiv.

[52]  E. Candès,et al.  Ridgelets: a key to higher-dimensional intermittency? , 1999, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[53]  Helmut Bölcskei,et al.  Deep convolutional neural networks on cartoon functions , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).