暂无分享,去创建一个
[1] Dimitri P. Bertsekas,et al. Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.
[2] Sebastian U. Stich,et al. Unified Optimal Analysis of the (Stochastic) Gradient Method , 2019, ArXiv.
[3] Pablo A. Parrilo,et al. Convergence Rate of Incremental Gradient and Incremental Newton Methods , 2019, SIAM J. Optim..
[4] Peter Richtárik,et al. SGD: General Analysis and Improved Rates , 2019, ICML 2019.
[5] Asuman E. Ozdaglar,et al. Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.
[6] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[7] Lek-Heng Lim,et al. Recht-Ré Noncommutative Arithmetic-Geometric Mean Conjecture is False , 2020, ICML.
[8] Suvrit Sra,et al. On Tight Convergence Rates of Without-replacement SGD , 2020, ArXiv.
[9] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.
[10] Marten van Dijk,et al. Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD , 2018, NeurIPS.
[11] Marten van Dijk,et al. A Unified Convergence Analysis for Shuffling-Type Gradient Methods , 2020, ArXiv.
[12] Anthony Man-Cho So,et al. Incremental Methods for Weakly Convex Optimization , 2019, ArXiv.
[13] Ohad Shamir,et al. How Good is SGD with Random Shuffling? , 2019, COLT.
[14] Suvrit Sra,et al. Random Shuffling Beats SGD after Finite Epochs , 2018, ICML.
[15] Dimitris Papailiopoulos,et al. Closing the convergence gap of SGD without replacement , 2020, ICML.
[16] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.
[17] Ruoyu Sun,et al. Optimization for deep learning: theory and algorithms , 2019, ArXiv.
[18] Prateek Jain,et al. SGD without Replacement: Sharper Rates for General Smooth Convex Functions , 2019, ICML.
[19] L. Bottou. Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms , 2009 .
[20] Dimitri P. Bertsekas,et al. Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..
[21] Luigi Grippo,et al. A class of unconstrained minimization methods for neural network training , 1994 .
[22] Ruo-Yu Sun,et al. Optimization for Deep Learning: An Overview , 2020, Journal of the Operations Research Society of China.
[23] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[24] O. Mangasarian,et al. Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .
[25] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[26] Peter Richt'arik,et al. Better Theory for SGD in the Nonconvex World , 2020, Trans. Mach. Learn. Res..
[27] Zhi-Quan Luo,et al. On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.
[28] Konstantin Mishchenko,et al. Adaptive gradient descent without descent , 2019, ICML.
[29] Christopher Ré,et al. Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.
[30] Ali H. Sayed,et al. Stochastic Learning Under Random Reshuffling With Constant Step-Sizes , 2018, IEEE Transactions on Signal Processing.
[31] Ohad Shamir,et al. The Complexity of Finding Stationary Points with Stochastic Gradient Descent , 2020, ICML.
[32] Christopher Ré,et al. Toward a Noncommutative Arithmetic-geometric Mean Inequality: Conjectures, Case-studies, and Consequences , 2012, COLT.
[33] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.
[34] Marc Teboulle,et al. Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..