暂无分享,去创建一个
Vladimir Braverman | Quanquan Gu | Jingfeng Wu | Difan Zou | Quanquan Gu | V. Braverman | Difan Zou | Jingfeng Wu
[1] L. Trefethen,et al. Numerical linear algebra , 1997 .
[2] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[3] Nathan Srebro,et al. Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy , 2020, NeurIPS.
[4] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[5] Pradeep Ravikumar,et al. Connecting Optimization and Regularization Paths , 2018, NeurIPS.
[6] Peter Richtárik,et al. SGD: General Analysis and Improved Rates , 2019, ICML 2019.
[7] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[8] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[9] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[10] E Weinan,et al. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.
[11] Preetum Nakkiran. Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems , 2020, ArXiv.
[12] Raef Bassily,et al. Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses , 2020, NeurIPS.
[13] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[14] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[15] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[16] Aleksander Madry,et al. The Two Regimes of Deep Network Training , 2020, ArXiv.
[17] Lei Wu. How SGD Selects the Global Minima in Over-parameterized Learning : A Dynamical Stability Perspective , 2018 .
[18] Colin Wei,et al. Shape Matters: Understanding the Implicit Bias of the Noise Covariance , 2020, COLT.
[19] Matus Telgarsky,et al. Gradient descent follows the regularization path for general losses , 2020, COLT.
[20] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[21] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[22] Jinghui Chen,et al. Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.
[23] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[24] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[25] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[26] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[27] Dimitris S. Papailiopoulos,et al. Stability and Generalization of Learning Algorithms that Converge to Global Optima , 2017, ICML.
[28] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[29] Kyunghyun Cho,et al. The Break-Even Point on Optimization Trajectories of Deep Neural Networks , 2020, ICLR.
[30] Christoph H. Lampert,et al. Data-Dependent Stability of Stochastic Gradient Descent , 2017, ICML.
[31] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[32] Edgar Dobriban,et al. The Implicit Regularization of Stochastic Gradient Flow for Least Squares , 2020, ICML.
[33] Zhanxing Zhu,et al. On the Noisy Gradient Descent that Generalizes as SGD , 2019, ICML.
[34] Wenqing Hu,et al. On the diffusion approximation of nonconvex stochastic gradient descent , 2017, Annals of Mathematical Sciences and Applications.
[35] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[37] Gaël Richard,et al. On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks , 2019, ArXiv.
[38] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[39] Jascha Sohl-Dickstein,et al. The large learning rate phase of deep learning: the catapult mechanism , 2020, ArXiv.
[40] J. Zico Kolter,et al. A Continuous-Time View of Early Stopping for Least Squares Regression , 2018, AISTATS.
[41] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[42] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[43] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[44] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[45] Zhanxing Zhu,et al. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects , 2018, ICML.
[46] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[47] Nathan Srebro,et al. Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate , 2018, AISTATS.
[48] Yuxin Chen,et al. Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion , 2018, ICML.