Nonlinear Weighted Directed Acyclic Graph and A Priori Estimates for Neural Networks

In an attempt to better understand structural benefits and generalization power of deep neural networks, we firstly present a novel graph theoretical formulation of neural network models, including fully connected, residual network (ResNet) and densely connected networks (DenseNet). Secondly, we extend the error analysis of the population risk for two layer network \cite{ew2019prioriTwo} and ResNet \cite{e2019prioriRes} to DenseNet, and show further that for neural networks satisfying certain mild conditions, similar estimates can be obtained. These estimates are a priori in nature since they depend sorely on the information prior to the training process, in particular, the bounds for the estimation errors are independent of the input dimension.

[1]  E Weinan,et al.  A priori estimates for classification problems using neural networks , 2020, ArXiv.

[2]  E Weinan,et al.  Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't , 2020, CSIAM Transactions on Applied Mathematics.

[3]  E. Weinan,et al.  Representation formulas and pointwise properties for Barron functions , 2020, Calculus of Variations and Partial Differential Equations.

[4]  E. Weinan,et al.  Kolmogorov width decay and poor approximators in machine learning: shallow neural networks, random feature models and neural tangent kernels , 2020, Research in the Mathematical Sciences.

[5]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[6]  Quanquan Gu,et al.  A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks , 2020, NeurIPS.

[7]  Joan Bruna,et al.  Spurious Valleys in One-hidden-layer Neural Network Optimization Landscapes , 2019, J. Mach. Learn. Res..

[8]  E. Weinan,et al.  A Priori Estimates of the Population Risk for Residual Networks , 2019, ArXiv.

[9]  Quanquan Gu,et al.  Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.

[10]  Ruosong Wang,et al.  Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.

[11]  Yuanzhi Li,et al.  Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[12]  Yuanzhi Li,et al.  Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.

[13]  O. Shamir,et al.  Size-Independent Sample Complexity of Neural Networks , 2017, COLT.

[14]  Tomaso A. Poggio,et al.  Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.

[15]  Behnam Neyshabur,et al.  Implicit Regularization in Deep Learning , 2017, ArXiv.

[16]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[17]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[18]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[19]  Amit Daniely,et al.  SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.

[20]  Tengyu Ma,et al.  Identity Matters in Deep Learning , 2016, ICLR.

[21]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[23]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[26]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[27]  Balázs Keszegh,et al.  Topological orderings of weighted directed acyclic graphs , 2013, Inf. Process. Lett..

[28]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[29]  Michael Taylor,et al.  Pseudodifferential Operators and Nonlinear PDE , 1991 .

[30]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[31]  Lei Wu,et al.  A priori estimates of the population risk for two-layer neural networks , 2018, Communications in Mathematical Sciences.