暂无分享,去创建一个
[1] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[2] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[3] Anit Kumar Sahu,et al. Federated Optimization in Heterogeneous Networks , 2018, MLSys.
[4] Boris Polyak. Gradient methods for the minimisation of functionals , 1963 .
[5] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[6] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[7] Quanquan Gu,et al. Stochastic Nested Variance Reduction for Nonconvex Optimization , 2018, J. Mach. Learn. Res..
[8] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.
[9] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[10] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[11] Saeed Ghadimi,et al. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.
[12] Konstantin Mishchenko,et al. Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.
[13] Mihai Anitescu,et al. Degenerate Nonlinear Programming with a Quadratic Growth Condition , 1999, SIAM J. Optim..
[14] Xiang Li,et al. Communication Efficient Decentralized Training with Multiple Local Updates , 2019, ArXiv.
[15] Yi Zhou,et al. Random gradient extrapolation for distributed and stochastic optimization , 2017, SIAM J. Optim..
[16] Xiang Li,et al. Communication Efficient Decentralized Training with Multiple Local Updates , 2019, ArXiv.
[17] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[18] Sebastian U. Stich,et al. Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.
[19] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.
[20] Sarit Khirirat,et al. Distributed learning with compressed gradients , 2018, 1806.06573.
[21] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[22] Peter Richtárik,et al. First Analysis of Local GD on Heterogeneous Data , 2019, ArXiv.
[23] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..
[24] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.
[25] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.
[26] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[27] Alexander J. Smola,et al. Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.
[28] Zhize Li,et al. Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization , 2020, ICML.
[29] Peter Richtárik,et al. Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop , 2019, ALT.
[30] Peter Richt'arik,et al. Better Theory for SGD in the Nonconvex World , 2020, Trans. Mach. Learn. Res..
[31] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[32] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[33] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[34] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[35] Zhize Li,et al. A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization , 2020, AISTATS.
[36] Guanghui Lan,et al. A unified variance-reduced accelerated gradient method for convex optimization , 2019, NeurIPS.
[37] Lam M. Nguyen,et al. ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization , 2019, J. Mach. Learn. Res..
[38] Peter Richtárik,et al. SGD: General Analysis and Improved Rates , 2019, ICML 2019.
[39] Mehryar Mohri,et al. SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.
[40] Jian Li,et al. A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization , 2018, NeurIPS.
[41] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.
[42] Peter Richtárik,et al. SEGA: Variance Reduction via Gradient Sketching , 2018, NeurIPS.
[43] Zhize Li,et al. SSRGD: Simple Stochastic Recursive Gradient Descent for Escaping Saddle Points , 2019, NeurIPS.
[44] Peter Richtárik,et al. Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.
[45] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[46] Zhize Li,et al. Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization , 2019, COLT.
[47] Peter Richtárik,et al. A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent , 2019, AISTATS.
[48] Yi Zhou,et al. An optimal randomized incremental gradient method , 2015, Mathematical Programming.
[49] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[50] Qiang Yang,et al. Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..
[51] Dmitry Kovalev,et al. Distributed Fixed Point Methods with Compressed Iterates , 2019, ArXiv.
[52] Z.-Q. Luo,et al. Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..
[53] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[54] Peter Richtárik,et al. L-SVRG and L-Katyusha with Arbitrary Sampling , 2019, J. Mach. Learn. Res..
[55] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[56] Marco Canini,et al. Natural Compression for Distributed Deep Learning , 2019, MSML.
[57] Aurélien Lucchi,et al. Variance Reduced Stochastic Gradient Descent with Neighbors , 2015, NIPS.
[58] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.