暂无分享,去创建一个
Laurent Condat | Peter Richt'arik | Mher Safaryan | Alyazeed Albasyoni | Peter Richtárik | Laurent Condat | M. Safaryan | Alyazeed Albasyoni
[1] Mehryar Mohri,et al. SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.
[2] Aleksandr Beznosikov,et al. On Biased Compression for Distributed Learning , 2020, ArXiv.
[3] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[4] Zhize Li,et al. Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization , 2020, ICML.
[5] K. Böröczky,et al. Covering the Sphere by Equal Spherical Balls , 2003 .
[6] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[7] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[8] Martin J. Wainwright,et al. Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.
[9] Sebastian U. Stich,et al. Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.
[10] Sarit Khirirat,et al. Distributed learning with compressed gradients , 2018, 1806.06573.
[11] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[12] Peter Richtárik,et al. Randomized Distributed Mean Estimation: Accuracy vs. Communication , 2016, Front. Appl. Math. Stat..
[13] M. Kochol. Constructive approximation of a ball by polytopes , 1994 .
[14] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[15] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[16] Suhas Diggavi,et al. Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations , 2019, IEEE Journal on Selected Areas in Information Theory.
[17] Peter Richt'arik,et al. Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor , 2020, ArXiv.
[18] Martin Jaggi,et al. PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization , 2019, NeurIPS.
[19] Sebastian U. Stich,et al. The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication , 2019, ArXiv.
[20] James T. Kwok,et al. Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback , 2019, NeurIPS.
[21] Sewoong Oh,et al. Rate Distortion For Model Compression: From Theory To Practice , 2018, ICML.
[22] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[23] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[24] Dimitris S. Papailiopoulos,et al. ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.
[25] Ilya Dumer. Covering Spheres with Spheres , 2007, Discret. Comput. Geom..
[26] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[27] Aaron D. Wyner,et al. Coding Theorems for a Discrete Source With a Fidelity CriterionInstitute of Radio Engineers, International Convention Record, vol. 7, 1959. , 1993 .
[28] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[29] Thomas M. Cover,et al. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .
[30] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[31] Ji Liu,et al. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.
[32] Konstantin Mishchenko,et al. Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.
[33] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[34] Shaofeng Zou,et al. Information-Theoretic Understanding of Population Risk Improvement with Model Compression , 2019, AAAI.
[35] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[36] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.
[37] Peter Richtárik,et al. Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.
[38] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[39] Sam Ade Jacobs,et al. Communication Quantization for Data-Parallel Training of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).
[40] John Langford,et al. Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.
[41] Marco Canini,et al. Natural Compression for Distributed Deep Learning , 2019, MSML.