暂无分享,去创建一个
Robert M. Gower | Othmane Sebbouh | Nicolas Loizou | Peter Richt'arik | Ahmed Khaled | Peter Richtárik | Nicolas Loizou | R. Gower | Othmane Sebbouh | Ahmed Khaled
[1] Robert M. Gower,et al. Optimal mini-batch and step sizes for SAGA , 2019, ICML.
[2] Aurélien Lucchi,et al. Variance Reduced Stochastic Gradient Descent with Neighbors , 2015, NIPS.
[3] Peter Richtárik,et al. 99% of Parallel Optimization is Inevitably a Waste of Time , 2019, ArXiv.
[4] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.
[5] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[6] Peter Richtárik,et al. Randomized Distributed Mean Estimation: Accuracy vs. Communication , 2016, Front. Appl. Math. Stat..
[7] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.
[8] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[9] Jie Liu,et al. Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting , 2015, IEEE Journal of Selected Topics in Signal Processing.
[10] Peter Richt'arik,et al. Better Theory for SGD in the Nonconvex World , 2020, Trans. Mach. Learn. Res..
[11] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[12] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[13] Sebastian U. Stich,et al. Unified Optimal Analysis of the (Stochastic) Gradient Method , 2019, ArXiv.
[14] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[15] J. Lafferty,et al. High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.
[16] Tong Zhang,et al. Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.
[17] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[18] Peter Richtárik,et al. Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop , 2019, ALT.
[19] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.
[20] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[21] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[22] Sebastian U. Stich,et al. Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.
[23] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.
[24] Sarit Khirirat,et al. Distributed learning with compressed gradients , 2018, 1806.06573.
[25] Peter Richtárik,et al. Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.
[26] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[27] Stephen J. Wright. Coordinate descent algorithms , 2015, Mathematical Programming.
[28] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[29] Francis Bach,et al. Towards closing the gap between the theory and practice of SVRG , 2019, NeurIPS.
[30] Sharan Vaswani,et al. Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence , 2020, ArXiv.
[31] Benjamin Grimmer,et al. Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity , 2017, SIAM J. Optim..
[32] Peter Richtárik,et al. SEGA: Variance Reduction via Gradient Sketching , 2018, NeurIPS.
[33] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[34] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[35] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[36] Amir Beck,et al. First-Order Methods in Optimization , 2017 .
[37] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[38] Ke Tang,et al. Stochastic Gradient Descent for Nonconvex Learning Without Bounded Gradient Assumptions , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[39] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[40] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .
[41] Gersende Fort,et al. On Perturbed Proximal Gradient Algorithms , 2014, J. Mach. Learn. Res..
[42] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[43] H. Robbins. A Stochastic Approximation Method , 1951 .
[44] Peter Richtárik,et al. A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent , 2019, AISTATS.
[45] F. Bach,et al. Stochastic quasi-gradient methods: variance reduction via Jacobian sketching , 2018, Mathematical Programming.
[46] Peter Richtárik,et al. SGD: General Analysis and Improved Rates , 2019, ICML 2019.
[47] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[48] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.