Benign Overfitting in Two-layer Convolutional Neural Networks
暂无分享,去创建一个
[1] Niladri S. Chatterji,et al. Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data , 2022, COLT.
[2] Andrea Montanari,et al. The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training , 2020, The Annals of Statistics.
[3] Yuanzhi Li,et al. Feature Purification: How Adversarial Training Performs Robust Deep Learning , 2020, 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS).
[4] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[5] Quanquan Gu,et al. Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization , 2021, ICLR.
[6] Zhi-Hua Zhou,et al. Towards an Understanding of Benign Overfitting in Neural Networks , 2021, ArXiv.
[7] Mikhail Belkin,et al. Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures , 2021, NeurIPS.
[8] Vladimir Braverman,et al. Benign Overfitting of Constant-Stepsize SGD for Linear Regression , 2021, COLT.
[9] Quanquan Gu,et al. Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise , 2021, ICML.
[10] Philip M. Long,et al. Finite-sample analysis of interpolating linear classifiers in the overparameterized regime , 2020, J. Mach. Learn. Res..
[11] Yuan Cao,et al. How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks? , 2019, ICLR.
[12] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[13] Yuanzhi Li,et al. Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning , 2020, International Conference on Learning Representations.
[14] Jeffrey Pennington,et al. The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization , 2020, ICML.
[15] Ji Xu,et al. On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression , 2020, NeurIPS.
[16] Zhenyu Liao,et al. A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent , 2020, NeurIPS.
[17] Matus Telgarsky,et al. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks , 2019, ICLR.
[18] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[19] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[20] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[21] Yuan Cao,et al. Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks , 2019, NeurIPS.
[22] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[23] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[24] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[25] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[26] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[27] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[28] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[29] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[30] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[31] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[32] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[33] Xiao Zhang,et al. Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.
[34] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[35] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[36] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[37] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[38] Bin Yu,et al. Stability and Convergence Trade-off of Iterative Optimization Algorithms , 2018, ArXiv.
[39] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[40] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[41] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[42] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[43] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[44] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[45] Yuandong Tian,et al. When is a Convolutional Filter Easy To Learn? , 2017, ICLR.
[46] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[47] Kai Zheng,et al. Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints , 2017, COLT.
[48] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[49] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[50] Mahdi Soltanolkotabi,et al. Learning ReLUs via Gradient Descent , 2017, NIPS.
[51] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[52] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[53] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[54] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[55] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[56] Alexander J. Smola,et al. Neural Information Processing Systems , 1997, NIPS 1997.