论文信息 - A Revision of Neural Tangent Kernel-based Approaches for Neural Networks

A Revision of Neural Tangent Kernel-based Approaches for Neural Networks

Recent theoretical works based on the neural tangent kernel (NTK) have shed light on the optimization and generalization of over-parameterized networks, and partially bridge the gap between their practical success and classical learning theory. Especially, using the NTK-based approach, the following three representative results were obtained: (1) A training error bound was derived to show that networks can fit any finite training sample perfectly by reflecting a tighter characterization of training speed depending on the data complexity. (2) A generalization error bound invariant of network size was derived by using a data-dependent complexity measure (CMD). It follows from this CMD bound that networks can generalize arbitrary smooth functions. (3) A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network. This kernel outperforms its corresponding network and the existing gold standard, Random Forests, in few shot learning. For all of these results to hold, the network scaling factor $\kappa$ should decrease w.r.t. sample size n. In this case of decreasing $\kappa$, however, we prove that the aforementioned results are surprisingly erroneous. It is because the output value of trained network decreases to zero when $\kappa$ decreases w.r.t. n. To solve this problem, we tighten key bounds by essentially removing $\kappa$-affected values. Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.

Kyung-Su Kim | Eunho Yang | Aur'elie C. Lozano

[1] Samet Oymak,et al. Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian , 2019, ArXiv.

[2] Guodong Zhang,et al. Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks , 2019, NeurIPS.

[3] Junwei Lu,et al. On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond , 2018, ArXiv.

[4] Sivaraman Balakrishnan,et al. How Many Samples are Needed to Estimate a Convolutional Neural Network? , 2018, NeurIPS.

[5] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[6] Ruosong Wang,et al. Enhanced Convolutional Neural Tangent Kernels , 2019, ArXiv.

[7] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[8] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[9] Lili Su,et al. On Learning Over-parameterized Neural Networks: A Functional Approximation Prospective , 2019, NeurIPS.

[10] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.

[11] Lei Wu,et al. A Priori Estimates of the Generalization Error for Two-layer Neural Networks , 2018, Communications in Mathematical Sciences.