暂无分享,去创建一个
Eduard A. Gorbunov | Peter Richt'arik | Eduard Gorbunov | Zhize Li | Ilyas Fatkhullin | Igor Sokolov | Peter Richtárik | Zhize Li | I. Fatkhullin | Igor Sokolov
[1] Amir Beck,et al. First-Order Methods in Optimization , 2017 .
[2] Peter Richt'arik,et al. FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning , 2021, ArXiv.
[3] Tianbao Yang,et al. Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization , 2016, 1604.03257.
[4] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[5] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[6] Haibo Yang,et al. Achieving Linear Speedup with Partial Worker Participation in Non-IID Federated Learning , 2021, ICLR.
[7] Sebastian U. Stich,et al. Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.
[8] Peter Richt'arik,et al. A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning , 2020, ICLR.
[9] Nathan Srebro,et al. Lower Bounds for Non-Convex Stochastic Optimization , 2019, ArXiv.
[10] Eduard A. Gorbunov,et al. MARINA: Faster Non-Convex Distributed Learning with Compression , 2021, ICML.
[11] Martin Jaggi,et al. PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization , 2019, NeurIPS.
[12] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[13] Eduard A. Gorbunov,et al. Linearly Converging Error Compensated SGD , 2020, NeurIPS.
[14] Indranil Gupta,et al. CSER: Communication-efficient SGD with Error Reset , 2020, NeurIPS.
[15] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[16] Ji Liu,et al. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.
[17] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[18] Peter Richtárik,et al. On Biased Compression for Distributed Learning , 2020, ArXiv.
[19] Jianyu Wang,et al. Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies , 2020, ArXiv.
[20] L. Bottou. Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms , 2009 .
[21] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[22] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[23] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[24] Zeyuan Allen Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.
[25] Peter Richtárik,et al. Distributed Second Order Methods with Fast Rates and Compressed Communication , 2021, ICML.
[26] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[27] Marco Canini,et al. Natural Compression for Distributed Deep Learning , 2019, MSML.