暂无分享,去创建一个
[1] Ohad Shamir,et al. Distributed stochastic optimization and learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[2] Peter Richtárik,et al. Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.
[3] Anit Kumar Sahu,et al. Federated Optimization in Heterogeneous Networks , 2018, MLSys.
[4] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[5] J. T. Spooner,et al. Adaptive and Learning Systems for Signal Processing , Communications , and Control , 2013 .
[6] Sofiane Saadane,et al. On the rates of convergence of parallelized averaged stochastic gradient algorithms , 2017, Statistics.
[7] Bin Yu,et al. Three principles of data science: predictability, computability, and stability (PCS) , 2018, 2018 IEEE International Conference on Big Data (Big Data).
[8] Ohad Shamir,et al. Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.
[9] Jianyu Wang,et al. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.
[10] Dimitris S. Papailiopoulos,et al. Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..
[11] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..
[12] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[13] Martin J. Wainwright,et al. Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates , 2013, J. Mach. Learn. Res..
[14] Anupam Gupta,et al. Potential-Function Proofs for Gradient Methods , 2019, Theory Comput..
[15] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[16] Sebastian U. Stich,et al. The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication , 2019, ArXiv.
[17] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..
[18] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[19] Xiang Li,et al. On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.
[20] Aryan Mokhtari,et al. FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization , 2019, AISTATS.
[21] Farzin Haddadpour,et al. Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization , 2019, NeurIPS.
[22] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .
[23] Farzin Haddadpour,et al. Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization , 2019, ICML.
[24] Sebastian U. Stich,et al. Unified Optimal Analysis of the (Stochastic) Gradient Method , 2019, ArXiv.
[25] Gideon S. Mann,et al. Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.
[26] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[27] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.
[28] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[29] Aymeric Dieuleveut,et al. Communication trade-offs for Local-SGD with large step size , 2019, NeurIPS.
[30] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[31] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[32] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[33] Rong Jin,et al. On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.
[34] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[35] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[36] Suhas Diggavi,et al. Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations , 2019, IEEE Journal on Selected Areas in Information Theory.
[37] Jianyu Wang,et al. SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum , 2020, ICLR.
[38] Rong Jin,et al. On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization , 2019, ICML.
[39] Gregory F. Coppola. Iterative parameter mixing for distributed large-margin training of structured predictors for natural language processing , 2015 .
[40] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..
[41] Jonathan D. Rosenblatt,et al. On the Optimality of Averaging in Distributed Statistical Learning , 2014, 1407.2724.
[42] Tianbao Yang,et al. Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement , 2017 .
[43] Martin J. Wainwright,et al. FedSplit: An algorithmic framework for fast federated optimization , 2020, NeurIPS.
[44] Ohad Shamir,et al. Is Local SGD Better than Minibatch SGD? , 2020, ICML.
[45] Fan Zhou,et al. On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization , 2017, IJCAI.
[46] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.
[47] Benjamin Recht,et al. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..
[48] O. Mangasarian. Parallel Gradient Distribution in Unconstrained Optimization , 1995 .
[49] Prateek Jain,et al. Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification , 2016, J. Mach. Learn. Res..
[50] Bin Yu,et al. Stability and Convergence Trade-off of Iterative Optimization Algorithms , 2018, ArXiv.
[51] Shenghuo Zhu,et al. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.
[52] H. Robbins. A Stochastic Approximation Method , 1951 .
[53] Farzin Haddadpour,et al. On the Convergence of Local Descent Methods in Federated Learning , 2019, ArXiv.
[54] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[55] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[56] Konstantin Mishchenko,et al. Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.
[57] Martin Jaggi,et al. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.
[58] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.