暂无分享,去创建一个
[1] Anastasios Kyrillidis,et al. Minimum norm solutions do not always generalize well for over-parameterized problems , 2018, ArXiv.
[2] Yoshua Bengio,et al. On the Spectral Bias of Neural Networks , 2018, ICML.
[3] Samy Bengio,et al. Identity Crisis: Memorization and Generalization under Extreme Overparameterization , 2019, ICLR.
[4] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[5] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[6] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[7] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[8] Richard E. Turner,et al. Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.
[9] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[10] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[11] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[12] Yuanzhi Li,et al. On the Convergence Rate of Training Recurrent Neural Networks , 2018, NeurIPS.
[13] Aaron Mishkin,et al. To Each Optimizer a Norm, To Each Norm its Generalization , 2020, ArXiv.
[14] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[15] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[16] Julien Mairal,et al. On the Inductive Bias of Neural Tangent Kernels , 2019, NeurIPS.
[17] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[18] Samuel S. Schoenholz,et al. Disentangling Trainability and Generalization in Deep Neural Networks , 2020, ICML.
[19] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[20] Mikhail Belkin,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[21] Shun-ichi Amari,et al. When Does Preconditioning Help or Hurt Generalization? , 2021, ICLR.
[22] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[23] Anastasios Kyrillidis,et al. Minimum weight norm models do not always generalize well for over-parameterized problems , 2018 .
[24] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[25] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[26] Raef Bassily,et al. The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning , 2017, ICML.
[27] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[28] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[29] Mikhail Belkin,et al. Does data interpolation contradict statistical optimality? , 2018, AISTATS.
[30] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[31] Samet Oymak,et al. Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? , 2018, ICML.
[32] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[33] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[34] Zheng Xu,et al. The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent , 2019, ICML.
[35] Qian Qian,et al. The Implicit Bias of AdaGrad on Separable Data , 2019, NeurIPS.
[36] Wei Hu,et al. Width Provably Matters in Optimization for Deep Linear Neural Networks , 2019, ICML.
[37] Zheng Ma,et al. A type of generalization error induced by initialization in deep neural networks , 2019, MSML.
[38] Jaehoon Lee,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[39] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.