暂无分享,去创建一个
[1] Ali H. Sayed,et al. Convergence of Variance-Reduced Stochastic Learning under Random Reshuffling , 2017, ArXiv.
[2] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[3] Dimitri P. Bertsekas,et al. Incremental proximal methods for large scale convex optimization , 2011, Math. Program..
[4] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[5] Anthony Man-Cho So,et al. Incremental Methods for Weakly Convex Optimization , 2019, ArXiv.
[6] Konstantin Mishchenko,et al. Random Reshuffling: Simple Analysis with Vast Improvements , 2020, NeurIPS.
[7] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.
[8] Ohad Shamir,et al. Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.
[9] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[10] D. Bertsekas,et al. Convergen e Rate of In remental Subgradient Algorithms , 2000 .
[11] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[12] Prateek Jain,et al. SGD without Replacement: Sharper Rates for General Smooth Convex Functions , 2019, ICML.
[13] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[14] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[15] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[16] Marten van Dijk,et al. A Unified Convergence Analysis for Shuffling-Type Gradient Methods , 2020, ArXiv.
[17] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[18] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.
[19] Tie-Yan Liu,et al. Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling , 2017, Neurocomputing.
[20] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[21] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[22] Peter Richtárik,et al. New Convergence Aspects of Stochastic Gradient Algorithms , 2018, J. Mach. Learn. Res..
[23] H. Robbins. A Stochastic Approximation Method , 1951 .
[24] Ohad Shamir,et al. How Good is SGD with Random Shuffling? , 2019, COLT.
[25] Leslie N. Smith,et al. Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).
[26] Dimitri P. Bertsekas,et al. Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..
[27] Mingyi Hong,et al. On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.
[28] Lam M. Nguyen,et al. ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization , 2019, J. Mach. Learn. Res..
[29] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[30] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.
[31] Lam M. Nguyen,et al. A Hybrid Stochastic Optimization Framework for Stochastic Composite Nonconvex Optimization , 2019, ArXiv.
[32] L. Bottou. Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms , 2009 .
[33] Ali H. Sayed,et al. Variance-Reduced Stochastic Learning Under Random Reshuffling , 2017, IEEE Transactions on Signal Processing.
[34] Dimitris Papailiopoulos,et al. Closing the convergence gap of SGD without replacement , 2020, ICML.
[35] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[36] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[37] Suvrit Sra,et al. Random Shuffling Beats SGD after Finite Epochs , 2018, ICML.
[38] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[39] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[40] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[41] Francesco Orabona,et al. Exponential Step Sizes for Non-Convex Optimization , 2020, ArXiv.
[42] Richard G. Baraniuk,et al. Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent , 2020, ArXiv.
[43] Timothy Dozat,et al. Incorporating Nesterov Momentum into Adam , 2016 .
[44] Asuman E. Ozdaglar,et al. Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.
[45] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[46] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.