Breaking the centralized barrier for cross-device federated learning
暂无分享,去创建一个
Sashank J. Reddi | Sebastian U. Stich | Sai Praneeth Karimireddy | Martin Jaggi | Satyen Kale | S. Stich
[1] Zaïd Harchaoui,et al. Robust Aggregation for Federated Learning , 2019, IEEE Transactions on Signal Processing.
[2] Venkatesh Saligrama,et al. Federated Learning Based on Dynamic Regularization , 2021, ICLR.
[3] Ananda Theertha Suresh,et al. FedJAX: Federated learning simulation with JAX , 2021, ArXiv.
[4] Martin Jaggi,et al. Learning from History for Byzantine Robust Optimization , 2020, ICML.
[5] Manzil Zaheer,et al. Adaptive Federated Optimization , 2020, ICLR.
[6] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..
[7] Sashank J. Reddi,et al. Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning , 2020, ArXiv.
[8] Qinghua Liu,et al. Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization , 2020, NeurIPS.
[9] Kartik Sreenivasan,et al. Attack of the Tails: Yes, You Really Can Backdoor Federated Learning , 2020, NeurIPS.
[10] Jakub Konecný,et al. On the Outsized Importance of Learning Rates in Local Update Methods , 2020, ArXiv.
[11] Swaroop Ramaswamy,et al. Understanding Unintended Memorization in Federated Learning , 2020, ArXiv.
[12] Sai Praneeth Karimireddy,et al. Secure Byzantine-Robust Machine Learning , 2020, ArXiv.
[13] Nathan Srebro,et al. Minibatch vs Local SGD for Heterogeneous Distributed Learning , 2020, NeurIPS.
[14] Niranjan A. Subrahmanya,et al. Training Keyword Spotting Models on Non-IID Data with Federated Learning , 2020, INTERSPEECH.
[15] Martin Jaggi,et al. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.
[16] Ohad Shamir,et al. Is Local SGD Better than Minibatch SGD? , 2020, ICML.
[17] Ashok Cutkosky,et al. Momentum Improves Normalized SGD , 2020, ICML.
[18] Michael G. Rabbat,et al. SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum , 2019, ICLR.
[19] Peter Richtárik,et al. Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2019, AISTATS.
[20] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[21] Suvrit Sra,et al. Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity , 2019, ICLR.
[22] Tian Li,et al. Fair Resource Allocation in Federated Learning , 2019, ICLR.
[23] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[24] John C. Duchi,et al. Lower bounds for non-convex stochastic optimization , 2019, Mathematical Programming.
[25] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.
[26] Sashank J. Reddi,et al. Why ADAM Beats SGD for Attention Models , 2019, ArXiv.
[27] Tzu-Ming Harry Hsu,et al. Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification , 2019, ArXiv.
[28] Sebastian U. Stich,et al. The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication , 2019, 1909.05350.
[29] Francesco Orabona,et al. Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.
[30] Lam M. Nguyen,et al. Hybrid Stochastic Gradient Descent Algorithms for Stochastic Nonconvex Optimization , 2019, 1905.05920.
[31] Rong Jin,et al. On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.
[32] Martin Jaggi,et al. PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization , 2019, NeurIPS.
[33] Aymeric Dieuleveut,et al. Communication trade-offs for synchronized distributed SGD with large step size , 2019, NeurIPS 2019.
[34] Ohad Shamir,et al. The Complexity of Making the Gradient Small in Stochastic Convex Optimization , 2019, COLT.
[35] Hubert Eichner,et al. Towards Federated Learning at Scale: System Design , 2019, SysML.
[36] Mehryar Mohri,et al. Agnostic Federated Learning , 2019, ICML.
[37] Yoram Singer,et al. Memory Efficient Adaptive Optimization , 2019, NeurIPS.
[38] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[39] Léon Bottou,et al. On the Ineffectiveness of Variance Reduced Optimization for Deep Learning , 2018, NeurIPS.
[40] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[41] Shenghuo Zhu,et al. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.
[42] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[43] Kin K. Leung,et al. Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.
[44] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[45] Anit Kumar Sahu,et al. On the Convergence of Federated Optimization in Heterogeneous Networks , 2018, ArXiv.
[46] Sebastian Caldas,et al. LEAF: A Benchmark for Federated Settings , 2018, ArXiv.
[47] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .
[48] Sebastian Caldas,et al. Expanding the Reach of Federated Learning by Reducing Client Resource Requirements , 2018, ArXiv.
[49] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[50] Sanjiv Kumar,et al. cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.
[51] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[52] Sashank J. Reddi,et al. On the Convergence of Adam and Beyond , 2018, ICLR.
[53] Sanjiv Kumar,et al. Adaptive Methods for Nonconvex Optimization , 2018, NeurIPS.
[54] Tassilo Klein,et al. Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.
[55] Sarvar Patel,et al. Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..
[56] Yang You,et al. Large Batch Training of Convolutional Networks , 2017, 1708.03888.
[57] Jie Liu,et al. Stochastic Recursive Gradient Algorithm for Nonconvex Optimization , 2017, ArXiv.
[58] Gregory Cohen,et al. EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).
[59] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.
[60] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[61] Michael I. Jordan,et al. Less than a Single Pass: Stochastically Controlled Stochastic Gradient , 2016, AISTATS.
[62] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[63] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[64] Peter Richtárik,et al. Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.
[65] Alexander J. Smola,et al. AIDE: Fast and Communication Efficient Distributed Optimization , 2016, ArXiv.
[66] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[67] Ohad Shamir,et al. Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.
[68] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[69] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.
[70] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[71] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[72] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[73] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.