On the Convergence of Local Descent Methods in Federated Learning
暂无分享,去创建一个
[1] Gideon S. Mann,et al. Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.
[2] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[3] Sanjeev Khudanpur,et al. Parallel training of DNNs with Natural Gradient and Parameter Averaging , 2014 .
[4] Ohad Shamir,et al. Distributed stochastic optimization and learning , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[5] Hang Su,et al. Experiments on Parallel Training of Deep Neural Network using Model Averaging , 2015, ArXiv.
[6] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[7] Qiang Huo,et al. Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[9] Ioannis Mitliagkas,et al. Parallel SGD: When does averaging help? , 2016, ArXiv.
[10] Sofiane Saadane,et al. On the rates of convergence of parallelized averaged stochastic gradient algorithms , 2017, Statistics.
[11] Ameet Talwalkar,et al. Federated Multi-Task Learning , 2017, NIPS.
[12] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[13] Chinmay Hegde,et al. Collaborative Deep Learning in Fixed Topology Networks , 2017, NIPS.
[14] Prateek Jain,et al. Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification , 2016, J. Mach. Learn. Res..
[15] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[16] Yaoqing Yang,et al. Cross-Iteration Coded Computing , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[17] Stefan Wrobel,et al. Efficient Decentralized Deep Learning by Dynamic Model Averaging , 2018, ECML/PKDD.
[18] Anit Kumar Sahu,et al. On the Convergence of Federated Optimization in Heterogeneous Networks , 2018, ArXiv.
[19] Dimitris S. Papailiopoulos,et al. Gradient Diversity: a Key Ingredient for Scalable Distributed Learning , 2017, AISTATS.
[20] Jianyu Wang,et al. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.
[21] Sebastian Caldas,et al. LEAF: A Benchmark for Federated Settings , 2018, ArXiv.
[22] Fan Zhou,et al. On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization , 2017, IJCAI.
[23] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[24] Shenghuo Zhu,et al. Parallel Restarted SGD for Non-Convex Optimization with Faster Convergence and Less Communication , 2018, ArXiv.
[25] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.
[26] Ameet Talwalkar,et al. One-Shot Federated Learning , 2019, ArXiv.
[27] Mehryar Mohri,et al. Agnostic Federated Learning , 2019, ICML.
[28] Rong Jin,et al. On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.
[29] Xiang Li,et al. Communication Efficient Decentralized Training with Multiple Local Updates , 2019, ArXiv.
[30] Prateek Mittal,et al. Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.
[31] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[32] Farzin Haddadpour,et al. Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization , 2019, NeurIPS.
[33] Peter Richtárik,et al. First Analysis of Local GD on Heterogeneous Data , 2019, ArXiv.
[34] Farzin Haddadpour,et al. Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization , 2019, ICML.
[35] Anit Kumar Sahu,et al. Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.
[36] Tian Li,et al. Fair Resource Allocation in Federated Learning , 2019, ICLR.
[37] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[38] Xiang Li,et al. On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.