暂无分享,去创建一个
[1] Ahmed M. Abdelmoniem,et al. Compressed Communication for Distributed Deep Learning: Survey and Quantitative Evaluation , 2020 .
[2] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[3] Tong Zhang,et al. Error Compensated Distributed SGD Can Be Accelerated , 2020, NeurIPS.
[4] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[5] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[6] Peter Richtárik,et al. Coordinate descent with arbitrary sampling II: expected separable overapproximation , 2014, Optim. Methods Softw..
[7] Junzhou Huang,et al. Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.
[8] Sebastian U. Stich,et al. The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication , 2019, 1909.05350.
[9] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[10] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.
[11] Martin Jaggi,et al. PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization , 2019, NeurIPS.
[12] Peter Richtárik,et al. Randomized Iterative Methods for Linear Systems , 2015, SIAM J. Matrix Anal. Appl..
[13] Peter Richt'arik,et al. Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor , 2020, ArXiv.
[14] Ce Zhang,et al. Distributed Learning Systems with First-Order Methods , 2020, Found. Trends Databases.
[15] Ji Liu,et al. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.
[16] Peter Richtárik,et al. A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent , 2019, AISTATS.
[17] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[18] Peter Richtárik,et al. On Biased Compression for Distributed Learning , 2020, ArXiv.
[19] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[20] Mark W. Schmidt,et al. Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence , 2017 .
[21] Peter Richtárik,et al. One Method to Rule Them All: Variance Reduction for Data, Parameters and Many New Methods , 2019, ArXiv.
[22] Peter Richtárik,et al. Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.
[23] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[24] Mark W. Schmidt,et al. Variance-Reduced Methods for Machine Learning , 2020, Proceedings of the IEEE.
[25] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[26] Eduard A. Gorbunov,et al. Linearly Converging Error Compensated SGD , 2020, NeurIPS.
[27] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[28] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[29] Ali H. Sayed,et al. A Linearly Convergent Proximal Gradient Algorithm for Decentralized Optimization , 2019, NeurIPS.
[30] Peter Richtárik,et al. SEGA: Variance Reduction via Gradient Sketching , 2018, NeurIPS.
[31] Peter Richtárik,et al. Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches , 2018, AISTATS.
[32] Zeyuan Allen Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.
[33] Peter Richt'arik,et al. On Stochastic Sign Descent Methods , 2019 .
[34] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[35] Xun Qian,et al. Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization , 2020, ICML.
[36] Sebastian U. Stich,et al. Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.
[37] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[38] Peter Richtárik,et al. Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop , 2019, ALT.
[39] Nathan Srebro,et al. Minibatch vs Local SGD for Heterogeneous Distributed Learning , 2020, NeurIPS.
[40] Martin Jaggi,et al. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.
[41] Marco Canini,et al. Natural Compression for Distributed Deep Learning , 2019, MSML.
[42] Joseph K. Bradley,et al. Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.
[43] Ohad Shamir,et al. Is Local SGD Better than Minibatch SGD? , 2020, ICML.
[44] Peter Richtárik,et al. Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.
[45] Peter Richtárik,et al. On optimal probabilities in stochastic coordinate descent methods , 2013, Optim. Lett..
[46] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[47] Peter Richtárik,et al. 99% of Worker-Master Communication in Distributed Optimization Is Not Needed , 2020, UAI.
[48] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.
[49] Sarit Khirirat,et al. Distributed learning with compressed gradients , 2018, 1806.06573.
[50] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[51] Panos Kalnis,et al. Scaling Distributed Machine Learning with In-Network Aggregation , 2019, NSDI.
[52] Peter Richtárik,et al. Coordinate descent with arbitrary sampling I: algorithms and complexity† , 2014, Optim. Methods Softw..
[53] Dimitris S. Papailiopoulos,et al. ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.
[54] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.
[55] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.