暂无分享,去创建一个
Ohad Shamir | Nathan Srebro | Sebastian U. Stich | H. Brendan McMahan | Blake E. Woodworth | Blake Woodworth | Kumar Kshitij Patel | Zhen Dai | Brian Bullins | H. B. McMahan | Nathan Srebro | O. Shamir | Brian Bullins | Zhen Dai | S. Stich | N. Srebro
[1] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[2] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[3] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.
[4] Ohad Shamir,et al. Distributed stochastic optimization and learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[5] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[6] Nathan Srebro,et al. Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization , 2018, NeurIPS.
[7] Sebastian U. Stich,et al. Unified Optimal Analysis of the (Stochastic) Gradient Method , 2019, ArXiv.
[8] Prateek Jain,et al. Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification , 2016, J. Mach. Learn. Res..
[9] Sebastian U. Stich,et al. The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication , 2019, ArXiv.
[10] Farzin Haddadpour,et al. Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization , 2019, ICML.
[11] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..
[12] Jianyu Wang,et al. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.
[13] Mehryar Mohri,et al. SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.
[14] Shenghuo Zhu,et al. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.
[15] Farzin Haddadpour,et al. Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization , 2019, NeurIPS.
[16] Sofiane Saadane,et al. On the rates of convergence of parallelized averaged stochastic gradient algorithms , 2017, Statistics.
[17] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[18] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[19] Martin J. Wainwright,et al. Divide and Conquer Kernel Ridge Regression , 2013, COLT.
[20] Peter Richtárik,et al. Better Communication Complexity for Local SGD , 2019, ArXiv.
[21] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.
[22] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[23] Konstantin Mishchenko,et al. Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.
[24] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[25] Nathan Srebro,et al. Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox , 2017, COLT.
[26] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[27] Jonathan D. Rosenblatt,et al. On the Optimality of Averaging in Distributed Statistical Learning , 2014, 1407.2724.
[28] Aymeric Dieuleveut,et al. Communication trade-offs for Local-SGD with large step size , 2019, NeurIPS.
[29] Max Simchowitz. On the Randomized Complexity of Minimizing a Convex Quadratic Function , 2018, ArXiv.
[30] Ohad Shamir,et al. Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.
[31] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[32] Gregory F. Coppola. Iterative parameter mixing for distributed large-margin training of structured predictors for natural language processing , 2015 .
[33] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..
[34] Fan Zhou,et al. On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization , 2017, IJCAI.
[35] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.
[36] Ioannis Mitliagkas,et al. Parallel SGD: When does averaging help? , 2016, ArXiv.