Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning
暂无分享,去创建一个
[1] Blake E. Woodworth,et al. Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays , 2022, NeurIPS.
[2] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[3] Hamid Reza Feyzmahdavian,et al. Delay-adaptive step-sizes for asynchronous learning , 2022, ICML.
[4] Jia Liu,et al. Anarchic Federated Learning , 2021, ICML.
[5] Assaf Schuster,et al. Learning Under Delayed Feedback: Implicitly Adapting to Gradient Delays , 2021, ArXiv.
[6] Amit Daniely,et al. Asynchronous Stochastic Optimization Robust to Arbitrary Delays , 2021, NeurIPS.
[7] Michael G. Rabbat,et al. Federated Learning with Buffered Asynchronous Aggregation , 2021, AISTATS.
[8] Longbo Huang,et al. Fast Federated Learning in the Presence of Arbitrary Device Unavailability , 2021, NeurIPS.
[9] Martin Jaggi,et al. Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates , 2021, AISTATS.
[10] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[11] Xiangnan He,et al. A Survey on Large-Scale Machine Learning , 2020, IEEE Transactions on Knowledge and Data Engineering.
[12] Angelia Nedic,et al. Distributed Gradient Methods for Convex Machine Learning Problems in Networks: Distributed Optimization , 2020, IEEE Signal Processing Magazine.
[13] Martin Jaggi,et al. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.
[14] Shaojie Tang,et al. Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability , 2020, INFORMS Journal on Computing.
[15] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..
[16] John C. Duchi,et al. Lower bounds for non-convex stochastic optimization , 2019, Mathematical Programming.
[17] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.
[18] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[19] Martin Jaggi,et al. PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization , 2019, NeurIPS.
[20] Hubert Eichner,et al. Towards Federated Learning at Scale: System Design , 2019, SysML.
[21] Michael G. Rabbat,et al. Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.
[22] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[23] Ohad Shamir,et al. A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates , 2018, ALT.
[24] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[25] Parijat Dube,et al. Slow and Stale Gradients Can Win the Race , 2018, IEEE Journal on Selected Areas in Information Theory.
[26] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.
[27] Fabian Pedregosa,et al. Improved asynchronous parallel optimization analysis for stochastic incremental methods , 2018, J. Mach. Learn. Res..
[28] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[29] Hamid Reza Feyzmahdavian,et al. Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server , 2016, ArXiv.
[30] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[31] Nenghai Yu,et al. Asynchronous Stochastic Gradient Descent with Delay Compensation , 2016, ICML.
[32] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[33] Alexander J. Smola,et al. AdaDelay: Delay Adaptive Distributed Stochastic Optimization , 2016, AISTATS.
[34] Blaise Agüera y Arcas,et al. Federated Learning of Deep Networks using Model Averaging , 2016, ArXiv.
[35] Christopher Ré,et al. Asynchronous stochastic convex optimization: the noise is in the noise and SGD don't care , 2015, NIPS.
[36] Ji Liu,et al. Staleness-Aware Async-SGD for Distributed Deep Learning , 2015, IJCAI.
[37] Dimitris S. Papailiopoulos,et al. Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..
[38] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.
[39] Hamid Reza Feyzmahdavian,et al. An asynchronous mini-batch algorithm for regularized stochastic optimization , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).
[40] Matthew J. Streeter,et al. Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning , 2014, NIPS.
[41] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[42] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[43] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[44] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[45] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[46] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[47] Gideon S. Mann,et al. Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.
[48] Olvi L. Mangasarian,et al. Backpropagation Convergence via Deterministic Nonmonotone Perturbed Minimization , 1993, NIPS.
[49] H. Robbins. A Stochastic Approximation Method , 1951 .
[50] Mary Wootters,et al. Asynchronous Distributed Optimization with Stochastic Delays , 2022, AISTATS.
[51] Hadrien Hendrikx,et al. Decentralized Optimization with Heterogeneous Delays: a Continuous-Time Approach , 2021, ArXiv.
[52] Shiva Prasad Kasiviswanathan,et al. Federated Learning under Arbitrary Communication Patterns , 2021, ICML.
[53] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .