Adaptive gradient descent without descent
暂无分享,去创建一个
[1] A. Goldstein. Cauchy's method of minimization , 1962 .
[2] L. Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .
[3] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[4] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[5] J. Borwein,et al. Two-Point Step Size Gradient Methods , 1988 .
[6] M. Raydan. On the Barzilai and Borwein choice of steplength for the gradient method , 1993 .
[7] Marcos Raydan,et al. The Barzilai and Borwein Gradient Method for the Large Scale Unconstrained Minimization Problem , 1997, SIAM J. Optim..
[8] L. Liao,et al. R-linear convergence of the Barzilai and Borwein gradient method , 2002 .
[9] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[10] Roger Fletcher,et al. Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming , 2005, Numerische Mathematik.
[11] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[12] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[13] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.
[14] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[15] Nikhil R. Devanur,et al. Distributed algorithms via gradient descent for fisher markets , 2011, EC '11.
[16] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[17] Benar Fux Svaiter,et al. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..
[18] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.
[19] Marc Teboulle,et al. Performance of first-order methods for smooth convex minimization: a novel approach , 2012, Mathematical Programming.
[20] F. Maxwell Harper,et al. The MovieLens Datasets: History and Context , 2016, TIIS.
[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[22] Donghwan Kim,et al. Optimized first-order methods for smooth convex minimization , 2014, Mathematical Programming.
[23] José Yunier Bello Cruz,et al. On the convergence of the forward–backward splitting method with linesearches , 2015, Optim. Methods Softw..
[24] Saverio Salzo,et al. The Variable Metric Forward-Backward Splitting Algorithm Under Mild Differentiability Assumptions , 2016, SIAM J. Optim..
[25] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[26] Adrien B. Taylor,et al. Smooth strongly convex interpolation and exact worst-case performance of first-order methods , 2015, Mathematical Programming.
[27] Marc Teboulle,et al. A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications , 2017, Math. Oper. Res..
[28] Sashank J. Reddi,et al. On the Convergence of Adam and Beyond , 2018, ICLR.
[29] Sanjiv Kumar,et al. Adaptive Methods for Nonconvex Optimization , 2018, NeurIPS.
[30] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[31] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.
[32] Xiaoxia Wu,et al. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.
[33] Sebastian U. Stich,et al. Unified Optimal Analysis of the (Stochastic) Gradient Method , 2019, ArXiv.
[34] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[35] O. Burdakov,et al. Stabilized Barzilai-Borwein Method , 2019, Journal of Computational Mathematics.
[36] Francis Bach,et al. Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions , 2019, COLT.
[37] Zheng Qu,et al. Adaptive restart of accelerated gradient methods under local quadratic growth condition , 2017, IMA Journal of Numerical Analysis.
[38] S. Kakade,et al. Revisiting the Polyak step size , 2019, 1905.00313.
[39] Yura Malitsky,et al. Golden ratio algorithms for variational inequalities , 2018, Mathematical Programming.
[40] Yee Whye Teh,et al. Dual Space Preconditioning for Gradient Descent , 2019, SIAM J. Optim..