暂无分享,去创建一个
Sanjeev Arora | Kaifeng Lyu | Runzhe Wang | Sanjeev Arora | Zhiyuan Li | Kaifeng Lyu | Zhiyuan Li | Runzhe Wang
[1] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[2] Behnam Neyshabur,et al. Extreme Memorization via Scale of Initialization , 2020, ICLR.
[3] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[4] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[5] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[6] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[7] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[8] Yuan Cao,et al. How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks? , 2019, ICLR.
[9] Nathan Srebro,et al. Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy , 2020, NeurIPS.
[10] Matus Telgarsky,et al. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks , 2020, ICLR.
[11] Francis Bach,et al. Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks , 2019, NeurIPS.
[12] Amit Daniely,et al. The Implicit Bias of Depth: How Incremental Learning Drives Generalization , 2020, ICLR.
[13] Yuan Cao,et al. Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise , 2021, ICML.
[14] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[15] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[16] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[17] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[18] Nathan Srebro,et al. Kernel and Rich Regimes in Overparametrized Models , 2019, COLT.
[19] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[20] Sylvain Gelly,et al. Gradient Descent Quantizes ReLU Network Features , 2018, ArXiv.
[21] Nadav Cohen,et al. Implicit Regularization in Tensor Factorization , 2021, ICML.
[22] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[23] Kaifeng Lyu,et al. Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning , 2021, ICLR.
[24] Kalyanmoy Deb,et al. Approximate KKT points and a proximity measure for termination , 2013, J. Glob. Optim..
[25] Mary Phuong,et al. The inductive bias of ReLU networks on orthogonally separable data , 2021, ICLR.
[26] Amir Globerson,et al. Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem , 2018, ICML.
[27] F. Clarke. Generalized gradients and applications , 1975 .
[28] Nathan Srebro,et al. Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate , 2018, AISTATS.
[29] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[30] M. Coste. AN INTRODUCTION TO O-MINIMAL GEOMETRY , 2002 .
[31] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[32] Prateek Jain,et al. The Pitfalls of Simplicity Bias in Neural Networks , 2020, NeurIPS.
[33] Joan Bruna,et al. Gradient Dynamics of Shallow Univariate ReLU Networks , 2019, NeurIPS.
[34] Fred Zhang,et al. SGD on Neural Networks Learns Functions of Increasing Complexity , 2019, NeurIPS.
[35] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[36] Jeffrey Pennington,et al. The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks , 2020, NeurIPS.
[37] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[38] Dmitriy Drusvyatskiy,et al. Stochastic Subgradient Method Converges on Tame Functions , 2018, Foundations of Computational Mathematics.
[39] Yaoyu Zhang,et al. Towards Understanding the Condensation of Two-layer Neural Networks at Initial Training , 2021, ArXiv.
[40] J. Bolte,et al. Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity , 2009 .
[41] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[42] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[43] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[44] Amir Globerson,et al. Towards Understanding Learning in Neural Networks with Linear Teachers , 2021, ICML.
[45] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[46] Giulio Biroli,et al. An analytic theory of shallow networks dynamics for hinge loss classification , 2020, NeurIPS.
[47] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[48] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[49] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[50] Yaoyu Zhang,et al. Phase diagram for two-layer ReLU neural networks at infinite-width limit , 2020, J. Mach. Learn. Res..
[51] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[52] Matus Telgarsky,et al. Risk and parameter convergence of logistic regression , 2018, ArXiv.
[53] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[54] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[55] Yu. S. Ledyaev,et al. Nonsmooth analysis and control theory , 1998 .
[56] Matus Telgarsky,et al. Directional convergence and alignment in deep learning , 2020, NeurIPS.