Spectral Roll-off Points: Estimating Useful Information Under the Basis of Low-frequency Data Representations

Useful information is the basis for model decisions. Estimating useful information in feature maps promotes the understanding of the mechanisms of neural networks. Low frequency is a prerequisite for useful information in data representations, because downscaling operations reduce the communication bandwidth. This study proposes the use of spectral roll-off points (SROPs) to integrate the low-frequency condition when estimating useful information. The computation of an SROP is extended from a 1-D signal to a 2-D image by the required rotation invariance in image classification tasks. SROP statistics across feature maps are implemented for layer-wise useful information estimation. Sanity checks demonstrate that the variation of layer-wise SROP distributions among model input can be used to recognize useful components that support model decisions. Moreover, the variations of SROPs and accuracy, the ground truth of useful information of models, are synchronous when adopting sufficient training in various model structures. Therefore, SROP is an accurate and convenient estimation of useful information. It promotes the explainability of artificial intelligence with respect to frequencydomain knowledge.

[1]  Stefano Ermon,et al.  A Theory of Usable Information Under Computational Constraints , 2020, ICLR.

[2]  Hanna Mazzawi,et al.  Towards Task and Architecture-Independent Generalization Gap Predictors , 2019, ArXiv.

[3]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Koray Kavukcuoglu,et al.  Exploiting Cyclic Symmetry in Convolutional Neural Networks , 2016, ICML.

[5]  M. Degroot Uncertainty, Information, and Sequential Experiments , 1962 .

[6]  Andrea Vedaldi,et al.  Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[7]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[8]  Alessandro Laio,et al.  Estimating the intrinsic dimension of datasets by a minimal neighborhood information , 2017, Scientific Reports.

[9]  E. K. Lenzi,et al.  Statistical mechanics based on Renyi entropy , 2000 .

[10]  Alessandro Laio,et al.  Intrinsic dimension of data representations in deep neural networks , 2019, NeurIPS.

[11]  Zheng Ma,et al.  Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks , 2019, Communications in Computational Physics.

[12]  S. Maus The geomagnetic power spectrum , 2008 .

[13]  Zheng Zhang,et al.  Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation , 2020, ECCV.

[14]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[15]  Gintare Karolina Dziugaite,et al.  Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates , 2019, NeurIPS.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Antonio Politi,et al.  Hausdorff Dimension and Uniformity Factor of Strange Attractors , 1984 .

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Yoshua Bengio,et al.  On the Spectral Bias of Neural Networks , 2018, ICML.

[20]  Eric P. Xing,et al.  High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Zhi-Qin John Xu,et al.  Training behavior of deep neural network in frequency domain , 2018, ICONIP.

[22]  Quanshi Zhang,et al.  Explaining Knowledge Distillation by Quantifying the Knowledge , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[25]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[26]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[27]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[28]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[29]  Hossein Mobahi,et al.  Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.

[30]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[31]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[32]  Michele Parrinello,et al.  Using sketch-map coordinates to analyze and bias molecular dynamics simulations , 2012, Proceedings of the National Academy of Sciences.

[33]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[34]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[36]  Quanshi Zhang,et al.  Knowledge Consistency between Neural Networks and Beyond , 2019, ICLR.

[37]  Kai Xu,et al.  Learning in the Frequency Domain , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).