暂无分享,去创建一个
[1] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[2] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.
[3] Tengyuan Liang,et al. On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels , 2019, COLT.
[4] A. Montanari,et al. The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , 2019 .
[5] Philip M. Long,et al. Finite-sample analysis of interpolating linear classifiers in the overparameterized regime , 2020, ArXiv.
[6] Yuan Cao,et al. A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks , 2020, NeurIPS.
[7] Roman Vershynin,et al. High-Dimensional Probability , 2018 .
[8] Matus Telgarsky,et al. Directional convergence and alignment in deep learning , 2020, NeurIPS.
[9] Philip M. Long,et al. Failures of model-dependent generalization bounds for least-norm interpolation , 2020, J. Mach. Learn. Res..
[10] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[11] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[12] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[13] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[14] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[15] Colin Wei,et al. Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.
[16] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[17] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[18] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[19] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[20] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[21] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[22] Rongjie Lai,et al. Optimizing Mode Connectivity via Neuron Alignment , 2020, NeurIPS.
[23] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[24] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[25] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[26] Martin J. Wainwright,et al. On the Learnability of Fully-Connected Neural Networks , 2017, AISTATS.
[27] Alexandr Andoni,et al. Learning Polynomials with Neural Networks , 2014, ICML.
[28] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[29] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[30] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[31] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[32] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[33] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[34] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[35] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[36] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[37] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[38] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[39] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[40] Tengyuan Liang,et al. A Precise High-Dimensional Asymptotic Theory for Boosting and Min-L1-Norm Interpolated Classifiers , 2020, SSRN Electronic Journal.
[41] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[42] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[43] Matus Telgarsky,et al. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks , 2020, ICLR.
[44] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[45] Amir Globerson,et al. Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem , 2018, ICML.
[46] A. Tsigler,et al. Benign overfitting in ridge regression , 2020 .
[47] Ji Xu,et al. On the proliferation of support vectors in high dimensions , 2020, ArXiv.
[48] Rina Panigrahy,et al. Convergence Results for Neural Networks via Electrodynamics , 2017, ITCS.
[49] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[50] Xiao Zhang,et al. Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.
[51] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .
[52] Andrea Montanari,et al. Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit , 2019, COLT.
[53] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[54] Mikhail Belkin,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[55] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[56] Mikhail Belkin,et al. Classification vs regression in overparameterized regimes: Does the loss function matter? , 2020, ArXiv.