Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
暂无分享,去创建一个
Sashank J. Reddi | Sebastian U. Stich | Sai Praneeth Karimireddy | M. Mohri | A. Suresh | Martin Jaggi | Satyen Kale | S. Stich
[1] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[2] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[3] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[4] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[5] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.
[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[7] Ohad Shamir,et al. Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.
[8] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[9] Peter Richtárik,et al. Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.
[10] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[11] Alexander J. Smola,et al. AIDE: Fast and Communication Efficient Distributed Optimization , 2016, ArXiv.
[12] Yang You,et al. Large Batch Training of Convolutional Networks , 2017, 1708.03888.
[13] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[14] Tassilo Klein,et al. Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.
[15] Gregory Cohen,et al. EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).
[16] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[17] Sarvar Patel,et al. Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..
[18] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.
[19] Michael I. Jordan,et al. Less than a Single Pass: Stochastically Controlled Stochastic Gradient , 2016, AISTATS.
[20] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.
[21] Sashank J. Reddi,et al. On the Convergence of Adam and Beyond , 2018, ICLR.
[22] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .
[23] Anit Kumar Sahu,et al. On the Convergence of Federated Optimization in Heterogeneous Networks , 2018, ArXiv.
[24] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[25] Yue Zhao,et al. Federated Learning with Non-IID Data , 2018, ArXiv.
[26] Sebastian Caldas,et al. LEAF: A Benchmark for Federated Settings , 2018, ArXiv.
[27] Sanjiv Kumar,et al. cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.
[28] Sanjiv Kumar,et al. Adaptive Methods for Nonconvex Optimization , 2018, NeurIPS.
[29] Sebastian Caldas,et al. Expanding the Reach of Federated Learning by Reducing Client Resource Requirements , 2018, ArXiv.
[30] Matthew Johnson,et al. Compiling machine learning programs via high-level tracing , 2018 .
[31] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[32] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.
[33] John C. Duchi,et al. Lower bounds for non-convex stochastic optimization , 2019, Mathematical Programming.
[34] Mehryar Mohri,et al. Agnostic Federated Learning , 2019, ICML.
[35] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[36] Sebastian U. Stich,et al. The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication , 2019, 1909.05350.
[37] Tzu-Ming Harry Hsu,et al. Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification , 2019, ArXiv.
[38] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[39] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[40] Francesco Orabona,et al. Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.
[41] Ohad Shamir,et al. The Complexity of Making the Gradient Small in Stochastic Convex Optimization , 2019, COLT.
[42] Léon Bottou,et al. On the Ineffectiveness of Variance Reduced Optimization for Deep Learning , 2018, NeurIPS.
[43] Aymeric Dieuleveut,et al. Communication trade-offs for synchronized distributed SGD with large step size , 2019, NeurIPS 2019.
[44] Sashank J. Reddi,et al. Why ADAM Beats SGD for Attention Models , 2019, ArXiv.
[45] Lam M. Nguyen,et al. Hybrid Stochastic Gradient Descent Algorithms for Stochastic Nonconvex Optimization , 2019, 1905.05920.
[46] Shenghuo Zhu,et al. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.
[47] Kin K. Leung,et al. Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.
[48] Yoram Singer,et al. Memory Efficient Adaptive Optimization , 2019, NeurIPS.
[49] Niranjan A. Subrahmanya,et al. Training Keyword Spotting Models on Non-IID Data with Federated Learning , 2020, INTERSPEECH.
[50] Phillip B. Gibbons,et al. The Non-IID Data Quagmire of Decentralized Machine Learning , 2019, ICML.
[51] Suvrit Sra,et al. Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity , 2019, ICLR.
[52] Peter Richtárik,et al. Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2019, AISTATS.
[53] Ohad Shamir,et al. Is Local SGD Better than Minibatch SGD? , 2020, ICML.
[54] Swaroop Ramaswamy,et al. Understanding Unintended Memorization in Federated Learning , 2020, ArXiv.
[55] Sashank J. Reddi,et al. Why are Adaptive Methods Good for Attention Models? , 2020, NeurIPS.
[56] Tian Li,et al. Fair Resource Allocation in Federated Learning , 2019, ICLR.
[57] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.
[58] Ashok Cutkosky,et al. Momentum Improves Normalized SGD , 2020, ICML.
[59] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[60] Sai Praneeth Karimireddy,et al. Secure Byzantine-Robust Machine Learning , 2020, ArXiv.
[61] Martin Jaggi,et al. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.
[62] Anit Kumar Sahu,et al. Federated Optimization in Heterogeneous Networks , 2018, MLSys.
[63] Jakub Konecný,et al. On the Outsized Importance of Learning Rates in Local Update Methods , 2020, ArXiv.
[64] Michael G. Rabbat,et al. SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum , 2019, ICLR.
[65] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..
[66] Manzil Zaheer,et al. Adaptive Federated Optimization , 2020, ICLR.
[67] Venkatesh Saligrama,et al. Federated Learning Based on Dynamic Regularization , 2021, ICLR.
[68] Martin Jaggi,et al. Learning from History for Byzantine Robust Optimization , 2020, ICML.
[69] Zaïd Harchaoui,et al. Robust Aggregation for Federated Learning , 2019, IEEE Transactions on Signal Processing.