暂无分享,去创建一个
[1] Martin Jaggi,et al. PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning , 2020, NeurIPS.
[2] Martin Jaggi,et al. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.
[3] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[4] Martin Jaggi,et al. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.
[5] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[6] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[7] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[8] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[9] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[10] Jean-Baptiste Cordonnier,et al. Convex Optimization using Sparsified Stochastic Gradient Descent with Memory , 2018 .
[11] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[12] Peter Richtárik,et al. Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.
[13] Vladimir Braverman,et al. FetchSGD: Communication-Efficient Federated Learning with Sketching , 2022 .
[14] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[15] Sebastian U. Stich,et al. Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.
[16] Sarit Khirirat,et al. Distributed learning with compressed gradients , 2018, 1806.06573.
[17] Martin Jaggi,et al. PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization , 2019, NeurIPS.
[18] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[19] Hanlin Tang,et al. Communication Compression for Decentralized Training , 2018, NeurIPS.
[20] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[21] Jitendra Malik,et al. Trajectory Normalized Gradients for Distributed Optimization , 2019, ArXiv.
[22] Ji,et al. DeepSqueeze : Decentralization Meets Error-Compensated Compression , 2019 .
[23] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[24] Martin Jaggi,et al. Fully Quantized Distributed Gradient Descent , 2018 .
[25] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[26] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[27] Lei Yuan,et al. $\texttt{DeepSqueeze}$: Decentralization Meets Error-Compensated Compression , 2019 .
[28] Nathan Srebro,et al. Minibatch vs Local SGD for Heterogeneous Distributed Learning , 2020, NeurIPS.
[29] Sebastian U. Stich,et al. Unified Optimal Analysis of the (Stochastic) Gradient Method , 2019, ArXiv.
[30] Robert M. Gower,et al. Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization , 2020, Journal of Optimization Theory and Applications.
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[32] Amar Phanishayee,et al. The Non-IID Data Quagmire of Decentralized Machine Learning , 2019, ICML.
[33] Vladimir Braverman,et al. Communication-efficient distributed SGD with Sketching , 2019, NeurIPS.
[34] Sebastian U. Stich,et al. The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication , 2019, ArXiv.
[35] Aleksandr Beznosikov,et al. On Biased Compression for Distributed Learning , 2020, ArXiv.
[36] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.
[37] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[38] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[39] Martin Jaggi,et al. Decentralized Deep Learning with Arbitrary Communication Compression , 2019, ICLR.
[40] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[41] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[42] Mehryar Mohri,et al. SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.
[43] Xiang Li,et al. On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.