暂无分享,去创建一个
[1] Shankar Krishnan,et al. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.
[2] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[3] Julien Mairal,et al. A Generic Acceleration Framework for Stochastic Composite Optimization , 2019, NeurIPS.
[4] Asuman E. Ozdaglar,et al. Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions , 2018, SIAM J. Optim..
[5] Upendra Dave,et al. Applied Probability and Queues , 1987 .
[6] Vardan Papyan,et al. The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size. , 2018 .
[7] Adam M. Oberman,et al. Nesterov's method with decreasing learning rate leads to accelerated stochastic gradient descent , 2019 .
[8] E Weinan,et al. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.
[9] Stephen J. Roberts,et al. Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training , 2020, J. Mach. Learn. Res..
[10] Aaron Defazio,et al. Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball , 2021, COLT.
[11] Francis R. Bach,et al. From Averaging to Acceleration, There is Only a Step-size , 2015, COLT.
[12] E Weinan,et al. Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations , 2018, J. Mach. Learn. Res..
[13] Mert Gürbüzbalaban,et al. Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances , 2019, ICML.
[14] Fabian Pedregosa,et al. SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality , 2021, COLT.
[15] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[16] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[17] Michael G. Rabbat,et al. On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings , 2020, ICML.
[18] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..
[19] Prateek Jain,et al. On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization , 2018, 2018 Information Theory and Applications Workshop (ITA).
[20] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.
[21] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[22] F. Krzakala,et al. Asymptotic Errors for High-Dimensional Convex Penalized Linear Regression beyond Gaussian Matrices. , 2020, COLT 2020.
[23] Kristian Kirsch,et al. Theory Of Ordinary Differential Equations , 2016 .
[24] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[25] Aurélien Lucchi,et al. The Role of Memory in Stochastic Optimization , 2019, UAI.
[26] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[27] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[28] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[29] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..
[30] Zhenyu Liao,et al. A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent , 2020, NeurIPS.
[31] Guodong Zhang,et al. Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model , 2019, NeurIPS.
[32] G. Samuel Jordan,et al. VOLTERRA INTEGRAL AND FUNCTIONAL EQUATIONS (Encyclopedia of Mathematics and its Applications 34) , 1991 .
[33] Peter Richtárik,et al. Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods , 2017, Computational Optimization and Applications.
[34] Yann LeCun,et al. Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.
[35] Jason Yosinski,et al. Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.
[36] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[37] Yi Yang,et al. A Unified Analysis of Stochastic Momentum Methods for Deep Learning , 2018, IJCAI.
[38] David M. Blei,et al. A Variational Analysis of Stochastic Gradient Algorithms , 2016, ICML.
[39] Mikhail Belkin,et al. Accelerating SGD with momentum for over-parameterized learning , 2018, ICLR.
[40] Michael W. Mahoney,et al. Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning , 2018, J. Mach. Learn. Res..