暂无分享,去创建一个
[1] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..
[2] Konstantin Mishchenko,et al. 99% of Distributed Optimization is a Waste of Time: The Issue and How to Fix it , 2019 .
[3] Peter Richtárik,et al. Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling , 2015, NIPS.
[4] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.
[5] W. M. Goodall. Television by pulse code modulation , 1951 .
[6] Yurii Nesterov,et al. Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..
[7] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[8] Onur Mutlu,et al. Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.
[9] Peter Richtárik,et al. 99% of Parallel Optimization is Inevitably a Waste of Time , 2019, ArXiv.
[10] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.
[11] Sebastian U. Stich,et al. Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.
[12] Ahmed M. Abdelmoniem,et al. On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning , 2019, AAAI.
[13] Peter Richtárik,et al. Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.
[14] Peter Richtárik,et al. Randomized Distributed Mean Estimation: Accuracy vs. Communication , 2016, Front. Appl. Math. Stat..
[15] Nathan Srebro,et al. Semi-Cyclic Stochastic Gradient Descent , 2019, ICML.
[16] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[17] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[18] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[19] Martin Jaggi,et al. PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization , 2019, NeurIPS.
[20] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[21] Peter Richtárik,et al. A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent , 2019, AISTATS.
[22] Peter Richtárik,et al. On Biased Compression for Distributed Learning , 2020, ArXiv.
[23] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[24] Dan Alistarh,et al. QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent , 2016, ArXiv.
[25] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[26] Suhas Diggavi,et al. Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations , 2019, IEEE Journal on Selected Areas in Information Theory.
[27] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[28] Xun Qian,et al. Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization , 2020, ICML.
[29] Peter Richtárik,et al. SGD: General Analysis and Improved Rates , 2019, ICML 2019.
[30] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[31] Sebastian U. Stich,et al. Unified Optimal Analysis of the (Stochastic) Gradient Method , 2019, ArXiv.
[32] Eric P. Xing,et al. Managed communication and consistency for fast data-parallel iterative analytics , 2015, SoCC.
[33] Konstantin Mishchenko,et al. Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.
[34] Peter Richt'arik,et al. Nonconvex Variance Reduced Optimization with Arbitrary Sampling , 2018, ICML.
[35] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[36] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[37] Tianbao Yang,et al. Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization , 2016, 1604.03257.
[38] S. Gadat,et al. Stochastic Heavy ball , 2016, 1609.04228.
[39] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[40] Ohad Shamir,et al. Is Local SGD Better than Minibatch SGD? , 2020, ICML.
[41] Peter Richtárik,et al. Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.
[42] Jean-Baptiste Cordonnier,et al. Convex Optimization using Sparsified Stochastic Gradient Descent with Memory , 2018 .
[43] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.
[44] Antonin Chambolle,et al. Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Applications , 2017, SIAM J. Optim..
[45] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[46] Daniel M. Roy,et al. NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization , 2019, ArXiv.
[47] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[48] Hyeontaek Lim,et al. 3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning , 2018, MLSys.
[49] Sebastian U. Stich,et al. The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication , 2019, 1909.05350.
[50] Benjamin Grimmer,et al. Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity , 2017, SIAM J. Optim..
[51] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[52] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[53] Martin Jaggi,et al. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.
[54] Marco Canini,et al. Natural Compression for Distributed Deep Learning , 2019, MSML.
[55] Lawrence G. Roberts,et al. Picture coding using pseudo-random noise , 1962, IRE Trans. Inf. Theory.
[56] Peter Richtárik,et al. Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods , 2017, Computational Optimization and Applications.
[57] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.