暂无分享,去创建一个
Peter Richtárik | Konstantin Mishchenko | Ahmed Khaled | Peter Richtárik | Konstantin Mishchenko | Ahmed Khaled
[1] Lisandro Dalcin,et al. Parallel distributed computing using Python , 2011 .
[2] Sebastian Caldas,et al. Expanding the Reach of Federated Learning by Reducing Client Resource Requirements , 2018, ArXiv.
[3] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[4] Gideon S. Mann,et al. Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.
[5] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning , 2018, ACM Comput. Surv..
[6] O. Mangasarian. Parallel Gradient Distribution in Unconstrained Optimization , 1995 .
[7] Peng Jiang,et al. A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication , 2018, NeurIPS.
[8] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[9] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[10] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[11] Jianyu Wang,et al. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.
[12] Fan Zhou,et al. On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization , 2017, IJCAI.
[13] Suhas Diggavi,et al. Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations , 2019, IEEE Journal on Selected Areas in Information Theory.
[14] Peter Richtárik,et al. SGD: General Analysis and Improved Rates , 2019, ICML 2019.
[15] Shenghuo Zhu,et al. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.
[16] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[17] Gregory F. Coppola. Iterative parameter mixing for distributed large-margin training of structured predictors for natural language processing , 2015 .