Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top
暂无分享,去创建一个
[1] Eduard A. Gorbunov,et al. Distributed Methods with Absolute Compression and Error Compensation , 2022, 2203.02383.
[2] Eduard A. Gorbunov,et al. Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods , 2022, ArXiv.
[3] Sai Praneeth Karimireddy,et al. Byzantine-Robust Decentralized Learning via Self-Centered Clipping , 2022, ArXiv.
[4] Alexander Tyurin,et al. Permutation Compressors for Provably Faster Distributed Nonconvex Optimization , 2021, ArXiv.
[5] Eduard A. Gorbunov,et al. Secure Distributed Training at Scale , 2021, International Conference on Machine Learning.
[6] Peter Richtárik,et al. FedNL: Making Newton-Type Methods Applicable to Federated Learning , 2021, ICML.
[7] Philip S. Yu,et al. Privacy and Robustness in Federated Learning: Attacks and Defenses , 2020, IEEE transactions on neural networks and learning systems.
[8] Sai Praneeth Karimireddy,et al. Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing , 2020, ICLR.
[9] Michael I. Jordan,et al. Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization , 2020, SIAM J. Math. Data Sci..
[10] Zaïd Harchaoui,et al. Robust Aggregation for Federated Learning , 2019, IEEE Transactions on Signal Processing.
[11] Waheed U. Bajwa,et al. BRIDGE: Byzantine-Resilient Decentralized Gradient Descent , 2019, IEEE Transactions on Signal and Information Processing over Networks.
[12] Peter Richtárik,et al. Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees , 2021, ArXiv.
[13] Aritra Dutta,et al. Rethinking gradient sparsification as total error minimization , 2021, NeurIPS.
[14] Peter Richt'arik,et al. CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression , 2021, NeurIPS.
[15] Gennady Pekhimenko,et al. Distributed Deep Learning in Open Collaborations , 2021, NeurIPS.
[16] Peter Richt'arik,et al. EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback , 2021, NeurIPS.
[17] Nitin H. Vaidya,et al. Byzantine Fault-Tolerance in Decentralized Optimization under 2f-Redundancy , 2021, 2021 American Control Conference (ACC).
[18] BROADCAST: Reducing Both Stochastic and Compression Noise to Robustify Communication-Efficient Federated Learning , 2021, ArXiv.
[19] Aymeric Dieuleveut,et al. Preserved central model for faster bidirectional compression in distributed settings , 2021, NeurIPS.
[20] Eduard A. Gorbunov,et al. MARINA: Faster Non-Convex Distributed Learning with Compression , 2021, ICML.
[21] Peter Richtárik,et al. Distributed Second Order Methods with Fast Rates and Compressed Communication , 2021, ICML.
[22] Nitin H. Vaidya,et al. Byzantine Fault-Tolerance in Peer-to-Peer Distributed Gradient-Descent , 2021, ArXiv.
[23] Dan Alistarh,et al. Byzantine-Resilient Non-Convex Stochastic Gradient Descent , 2020, ICLR.
[24] Martin Jaggi,et al. Learning from History for Byzantine Robust Optimization , 2020, ICML.
[25] Martin Jaggi,et al. A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! , 2020, AISTATS.
[26] Tong Zhang,et al. Error Compensated Distributed SGD Can Be Accelerated , 2020, NeurIPS.
[27] Xiangliang Zhang,et al. PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization , 2020, ICML.
[28] R. Guerraoui,et al. Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning) , 2020, NeurIPS.
[29] Aryan Mokhtari,et al. Federated Learning with Compression: Unified Analysis and Sharp Guarantees , 2020, AISTATS.
[30] Qing Ling,et al. Byzantine-Robust Decentralized Stochastic Optimization over Static and Time-Varying Networks , 2020, Signal Process..
[31] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..
[32] Kannan Ramchandran,et al. Communication-Efficient and Byzantine-Robust Distributed Learning With Error Feedback , 2019, IEEE Journal on Selected Areas in Information Theory.
[33] Peter Richtárik,et al. L-SVRG and L-Katyusha with Arbitrary Sampling , 2019, J. Mach. Learn. Res..
[34] Lingjuan Lyu,et al. Towards Building a Robust and Fair Federated Learning System , 2020, ArXiv.
[35] Dan Alistarh,et al. Adaptive Gradient Quantization for Data-Parallel SGD , 2020, NeurIPS.
[36] Eduard A. Gorbunov,et al. Linearly Converging Error Compensated SGD , 2020, NeurIPS.
[37] Mark W. Schmidt,et al. Variance-Reduced Methods for Machine Learning , 2020, Proceedings of the IEEE.
[38] Francisco Herrera,et al. Dynamic Federated Learning Model for Identifying Adversarial Clients , 2020, ArXiv.
[39] Jayanth Reddy Regatti,et al. ByGARS: Byzantine SGD with Arbitrary Number of Attackers. , 2020 .
[40] A. Mazumdar,et al. Distributed Newton Can Communicate Less and Resist Byzantine Workers , 2020, NeurIPS.
[41] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[42] Mikhail Belkin,et al. Loss landscapes and optimization in over-parameterized non-linear systems and neural networks , 2020, Applied and Computational Harmonic Analysis.
[43] Peter Richtárik,et al. On Biased Compression for Distributed Learning , 2020, ArXiv.
[44] Zhize Li,et al. Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization , 2020, ICML.
[45] Qing Ling,et al. Federated Variance-Reduced Stochastic Gradient Descent With Robustness to Byzantine Attacks , 2019, IEEE Transactions on Signal Processing.
[46] S. Gelly,et al. Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.
[47] Suhas Diggavi,et al. Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations , 2019, IEEE Journal on Selected Areas in Information Theory.
[48] Adel Bibi,et al. A Stochastic Derivative Free Optimization Method with Momentum , 2019, ICLR.
[49] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[50] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[51] Hongyi Wang,et al. DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation , 2019, NeurIPS.
[52] Guanghui Lan,et al. A unified variance-reduced accelerated gradient method for convex optimization , 2019, NeurIPS.
[53] Marco Canini,et al. Natural Compression for Distributed Deep Learning , 2019, MSML.
[54] Francesco Orabona,et al. Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.
[55] Ji Liu,et al. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.
[56] Martin Jaggi,et al. PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization , 2019, NeurIPS.
[57] Rachid Guerraoui,et al. AGGREGATHOR: Byzantine Machine Learning via Robust Gradient Aggregation , 2019, SysML.
[58] Sebastian U. Stich,et al. Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.
[59] Indranil Gupta,et al. Fall of Empires: Breaking Byzantine-tolerant SGD by Inner Product Manipulation , 2019, UAI.
[60] Moran Baruch,et al. A Little Is Enough: Circumventing Defenses For Distributed Learning , 2019, NeurIPS.
[61] Martin Jaggi,et al. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.
[62] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[63] Peter Richtárik,et al. SGD: General Analysis and Improved Rates , 2019, ICML 2019.
[64] Peter Richtárik,et al. Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.
[65] Peter Richtárik,et al. SAGA with Arbitrary Sampling , 2019, ICML.
[66] Kamyar Azizzadenesheli,et al. signSGD with Majority Vote is Communication Efficient and Fault Tolerant , 2018, ICLR.
[67] Peter Richt'arik,et al. Nonconvex Variance Reduced Optimization with Arbitrary Sampling , 2018, ICML.
[68] Waheed Uz Zaman Bajwa,et al. ByRDiE: Byzantine-Resilient Distributed Coordinate Descent for Decentralized Learning , 2017, IEEE Transactions on Signal and Information Processing over Networks.
[69] Lili Su,et al. Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2019, PERV.
[70] Hiroaki Mikami,et al. Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash , 2018 .
[71] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[72] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[73] Ekasit Kijsipongse,et al. A hybrid GPU cluster and volunteer computing platform for scalable deep learning , 2018, The Journal of Supercomputing.
[74] Dimitris S. Papailiopoulos,et al. DRACO: Byzantine-resilient Distributed Training via Redundant Gradients , 2018, ICML.
[75] Dan Alistarh,et al. Byzantine Stochastic Gradient Descent , 2018, NeurIPS.
[76] Kannan Ramchandran,et al. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.
[77] Rachid Guerraoui,et al. The Hidden Vulnerability of Distributed Learning in Byzantium , 2018, ICML.
[78] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.
[79] Yi Zhou,et al. An optimal randomized incremental gradient method , 2015, Mathematical Programming.
[80] Rachid Guerraoui,et al. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.
[81] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[82] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[83] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.
[84] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.
[85] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[86] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[87] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[88] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[89] Artin,et al. SARAH : A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017 .
[90] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[91] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[92] Nitin H. Vaidya,et al. Fault-Tolerant Multi-Agent Optimization: Optimal Iterative Distributed Algorithms , 2016, PODC.
[93] Peter Richtárik,et al. Coordinate descent with arbitrary sampling I: algorithms and complexity† , 2014, Optim. Methods Softw..
[94] Peter Richtárik,et al. On optimal probabilities in stochastic coordinate descent methods , 2013, Optim. Lett..
[95] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[96] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[97] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[98] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[99] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[100] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[101] Xin Yuan,et al. Bandwidth optimal all-reduce algorithms for clusters of workstations , 2009, J. Parallel Distributed Comput..
[102] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[103] Boris Polyak. Gradient methods for the minimisation of functionals , 1963 .
[104] Lawrence G. Roberts,et al. Picture coding using pseudo-random noise , 1962, IRE Trans. Inf. Theory.
[105] W. M. Goodall. Television by pulse code modulation , 1951 .