Theory of the Frequency Principle for General Deep Neural Networks

Along with fruitful applications of Deep Neural Networks (DNNs) to realistic problems, recently, some empirical studies of DNNs reported a universal phenomenon of Frequency Principle (F-Principle): a DNN tends to learn a target function from low to high frequencies during the training. The F-Principle has been very useful in providing both qualitative and quantitative understandings of DNNs. In this paper, we rigorously investigate the F-Principle for the training dynamics of a general DNN at three stages: initial stage, intermediate stage, and final stage. For each stage, a theorem is provided in terms of proper quantities characterizing the F-Principle. Our results are general in the sense that they work for multilayer networks with general activation functions, population densities of data, and a large class of loss functions. Our work lays a theoretical foundation of the F-Principle for a better understanding of the training process of DNNs.

[1]  Zhi-Qin John Xu,et al.  Frequency Principle in Deep Learning with General Loss Functions and Its Potential Application , 2018, ArXiv.

[2]  Qingfu Zhang,et al.  Nonlinear Collaborative Scheme for Deep Neural Networks , 2018, ArXiv.

[3]  Grant M. Rotskoff,et al.  Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks , 2018, NeurIPS.

[4]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  E Weinan,et al.  Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations , 2017, Communications in Mathematics and Statistics.

[6]  Lexing Ying,et al.  Solving parametric PDE problems with artificial neural networks , 2017, European Journal of Applied Mathematics.

[7]  George Em Karniadakis,et al.  Adaptive activation functions accelerate convergence in deep and physics-informed neural networks , 2019, J. Comput. Phys..

[8]  Jinchao Xu,et al.  Relu Deep Neural Networks and Linear Finite Elements , 2018, Journal of Computational Mathematics.

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[11]  Zheng Ma,et al.  Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks , 2019, Communications in Computational Physics.

[12]  Justin A. Sirignano,et al.  Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..

[13]  Zhi-Qin John Xu,et al.  Understanding training and generalization in deep learning by Fourier analysis , 2018, ArXiv.

[14]  Neil C. Rabinowitz Meta-learners' learning dynamics are unlike learners' , 2019, ArXiv.

[15]  Yoshua Bengio,et al.  On the Spectral Bias of Neural Networks , 2018, ICML.

[16]  Yoshua Bengio,et al.  On the Spectral Bias of Deep Neural Networks , 2018, ArXiv.

[17]  Justin A. Sirignano,et al.  Mean field analysis of neural networks: A central limit theorem , 2018, Stochastic Processes and their Applications.

[18]  Lexing Ying,et al.  A Multiscale Neural Network Based on Hierarchical Matrices , 2018, Multiscale Model. Simul..

[19]  Zhi-Qin John Xu,et al.  Training behavior of deep neural network in frequency domain , 2018, ICONIP.

[20]  Andrea Montanari,et al.  A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.

[21]  Zheng Ma,et al.  Explicitizing an Implicit Bias of the Frequency Principle in Two-layer Neural Networks , 2019, ArXiv.

[22]  Wei Cai,et al.  PhaseDNN - A Parallel Phase Shift Deep Neural Network for Adaptive Wideband Learning , 2019, ArXiv.

[23]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[24]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[25]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .