暂无分享,去创建一个
Aryan Mokhtari | Farzin Haddadpour | Mehrdad Mahdavi | Mohammad Mahdi Kamani | Aryan Mokhtari | M. Mahdavi | Farzin Haddadpour
[1] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[2] Xiang Li,et al. On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.
[3] Aryan Mokhtari,et al. FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization , 2019, AISTATS.
[4] Farzin Haddadpour,et al. Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization , 2019, NeurIPS.
[5] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[6] Suhas Diggavi,et al. Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations , 2019, IEEE Journal on Selected Areas in Information Theory.
[7] Ioannis Mitliagkas,et al. Parallel SGD: When does averaging help? , 2016, ArXiv.
[8] Enhong Chen,et al. Variance Reduced Local SGD with Lower Communication Complexity , 2019, ArXiv.
[9] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[10] Anit Kumar Sahu,et al. Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.
[11] Sebastian Caldas,et al. LEAF: A Benchmark for Federated Settings , 2018, ArXiv.
[12] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[13] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[14] Fan Zhou,et al. On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization , 2017, IJCAI.
[15] Aryan Mokhtari,et al. Quantized Decentralized Consensus Optimization , 2018, 2018 IEEE Conference on Decision and Control (CDC).
[16] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[17] Jianyu Wang,et al. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.
[18] Martin Jaggi,et al. Decentralized Deep Learning with Arbitrary Communication Compression , 2019, ICLR.
[19] Ohad Shamir,et al. Is Local SGD Better than Minibatch SGD? , 2020, ICML.
[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Alexander J. Smola,et al. Learning with kernels , 1998 .
[22] Peter Richtárik,et al. SGD: General Analysis and Improved Rates , 2019, ICML 2019.
[23] Sebastian U. Stich,et al. Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.
[24] Ameet Talwalkar,et al. Federated Multi-Task Learning , 2017, NIPS.
[25] Junshan Zhang,et al. A Collaborative Learning Framework via Federated Meta-Learning , 2020, ArXiv.
[26] Hanlin Tang,et al. Communication Compression for Decentralized Training , 2018, NeurIPS.
[27] Farzin Haddadpour,et al. On the Convergence of Local Descent Methods in Federated Learning , 2019, ArXiv.
[28] Aryan Mokhtari,et al. Robust and Communication-Efficient Collaborative Learning , 2019, NeurIPS.
[29] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[30] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.
[31] Richeng Jin,et al. Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees , 2020, ArXiv.
[32] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[33] Junzhou Huang,et al. Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.
[34] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[35] Shenghuo Zhu,et al. Parallel Restarted SGD for Non-Convex Optimization with Faster Convergence and Less Communication , 2018, ArXiv.
[36] Farzin Haddadpour,et al. Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization , 2019, ICML.
[37] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[38] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[39] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[40] Wotao Yin,et al. FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data , 2020, ArXiv.
[41] Anit Kumar Sahu,et al. MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling , 2019, 2019 Sixth Indian Control Conference (ICC).
[42] Vladimir Braverman,et al. Communication-efficient distributed SGD with Sketching , 2019, NeurIPS.
[43] Mehryar Mohri,et al. SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.
[44] Shenghuo Zhu,et al. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.
[45] Sebastian U. Stich,et al. The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication , 2019, ArXiv.
[46] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.
[47] Manzil Zaheer,et al. Adaptive Federated Optimization , 2020, ICLR.
[48] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[49] Konstantin Mishchenko,et al. Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.
[50] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[51] Aryan Mokhtari,et al. Personalized Federated Learning: A Meta-Learning Approach , 2020, ArXiv.
[52] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.
[53] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[54] Zhize Li,et al. Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization , 2020, ICML.
[55] Yishay Mansour,et al. Three Approaches for Personalization with Applications to Federated Learning , 2020, ArXiv.
[56] Martin Jaggi,et al. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.
[57] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[58] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..
[59] Jemin George,et al. SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization , 2020, IEEE Journal on Selected Areas in Information Theory.
[60] Mehrdad Mahdavi,et al. Adaptive Personalized Federated Learning , 2020, ArXiv.