Adaptive Learning Rates for Faster Stochastic Gradient Methods
暂无分享,去创建一个
[1] Robert Mansel Gower,et al. SP2: A Second Order Stochastic Polyak Method , 2022, ArXiv.
[2] Robert Mansel Gower,et al. Cutting Some Slack for SGD with Adaptive Polyak Stepsizes , 2022, ArXiv.
[3] Aaron Defazio,et al. Stochastic Polyak Stepsize with a Moving Target , 2021, ArXiv.
[4] Sharan Vaswani,et al. Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence , 2020, AISTATS.
[5] Adam M. Oberman,et al. Stochastic Gradient Descent with Polyak’s Learning Rate , 2019, Journal of Scientific Computing.
[6] Tyler B. Johnson,et al. AdaScale SGD: A User-Friendly Algorithm for Distributed Training , 2020, ICML.
[7] Philipp Hennig,et al. BackPACK: Packing more into backprop , 2019, International Conference on Learning Representations.
[8] Konstantin Mishchenko,et al. Adaptive gradient descent without descent , 2019, ICML.
[9] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[10] Andrew Zisserman,et al. Training Neural Networks for and by Interpolation , 2019, ICML.
[11] Peter Richtárik,et al. Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop , 2019, ALT.
[12] Peter Richtárik,et al. Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory , 2017, SIAM J. Matrix Anal. Appl..
[13] Dmitry Kovalev,et al. Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates , 2019, ArXiv.
[14] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[15] Xiaoxia Wu,et al. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.
[16] S. Kakade,et al. Revisiting the Polyak step size , 2019, 1905.00313.
[17] Francis Bach,et al. Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions , 2019, COLT.
[18] Peter Richtárik,et al. SGD: General Analysis and Improved Rates , 2019, ICML 2019.
[19] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[20] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.
[21] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[22] Sashank J. Reddi,et al. On the Convergence of Adam and Beyond , 2018, ICLR.
[23] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.
[24] Michael I. Jordan,et al. Stochastic Cubic Regularization for Fast Nonconvex Optimization , 2017, NeurIPS.
[25] Dimitris S. Papailiopoulos,et al. Gradient Diversity: a Key Ingredient for Scalable Distributed Learning , 2017, AISTATS.
[26] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[27] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[28] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[29] Michael I. Jordan,et al. Distributed optimization with arbitrary local solvers , 2015, Optim. Methods Softw..
[30] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[31] Shiqian Ma,et al. Barzilai-Borwein Step Size for Stochastic Gradient Descent , 2016, NIPS.
[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Peter Richtárik,et al. Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling , 2015, NIPS.
[34] Tong Zhang,et al. Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.
[35] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[36] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.
[37] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[38] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[39] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[40] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[41] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[42] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[43] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[44] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[45] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[46] H. Robbins. A Stochastic Approximation Method , 1951 .
[47] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[48] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[49] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .