暂无分享,去创建一个
Wotao Yin | Tianyi Chen | Ziye Guo | Yuejiao Sun | W. Yin | Tianyi Chen | Ziye Guo | Yuejiao Sun
[1] H. Robbins. A Stochastic Approximation Method , 1951 .
[2] Vladimir Braverman,et al. Communication-efficient distributed SGD with Sketching , 2019, NeurIPS.
[3] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[4] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[5] Francis Bach,et al. On the Convergence of Adam and Adagrad , 2020, ArXiv.
[6] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[7] Georgios B. Giannakis,et al. LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning , 2018, NeurIPS.
[8] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[9] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[10] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[11] Qing Ling,et al. Asynchronous periodic event-triggered coordination of multi-agent systems , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).
[12] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[13] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[14] Peter Richtárik,et al. First Analysis of Local GD on Heterogeneous Data , 2019, ArXiv.
[15] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[16] Cho-Jui Hsieh,et al. A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order , 2016, NIPS.
[17] Hanlin Tang,et al. Communication Compression for Decentralized Training , 2018, NeurIPS.
[18] Jorge Cortés,et al. Event-triggered communication and control of networked systems for multi-agent consensus , 2017, Autom..
[19] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[20] Jianyu Wang,et al. SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum , 2020, ICLR.
[21] Anit Kumar Sahu,et al. Federated Optimization in Heterogeneous Networks , 2018, MLSys.
[22] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.
[23] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[24] Karl Henrik Johansson,et al. Distributed Event-Triggered Control for Multi-Agent Systems , 2012, IEEE Transactions on Automatic Control.
[25] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[26] Stefan Wrobel,et al. Efficient Decentralized Deep Learning by Dynamic Model Averaging , 2018, ECML/PKDD.
[27] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[28] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[29] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[30] Rong Jin,et al. On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.
[31] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[32] Lijun Zhang,et al. SAdam: A Variant of Adam for Strongly Convex Functions , 2019, ICLR.
[33] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[34] Jie Chen,et al. Asynchronous parallel adaptive stochastic gradient methods , 2020, ArXiv.
[35] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[36] Ji Liu,et al. APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm , 2020, ArXiv.
[37] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[39] Peng Jiang,et al. A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication , 2018, NeurIPS.
[40] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[41] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.
[42] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[43] Xiaoxia Wu,et al. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.
[44] Martin Jaggi,et al. PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization , 2019, NeurIPS.
[45] Farzin Haddadpour,et al. Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization , 2019, NeurIPS.
[46] Jianyu Wang,et al. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.
[47] Michael I. Jordan,et al. Distributed optimization with arbitrary local solvers , 2015, Optim. Methods Softw..
[48] Sebastian U. Stich,et al. Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.
[49] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.
[50] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[51] Mingyi Hong,et al. On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.
[52] Manzil Zaheer,et al. Adaptive Federated Optimization , 2020, ICLR.
[53] Junzhou Huang,et al. Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.
[54] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..
[55] Wotao Yin,et al. LASG: Lazily Aggregated Stochastic Gradients for Communication-Efficient Distributed Learning , 2020, ArXiv.
[56] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[57] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.
[58] Georgios B. Giannakis,et al. Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients , 2019, NeurIPS.
[59] Shenghuo Zhu,et al. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.
[60] Mehdi Bennis,et al. Wireless Network Intelligence at the Edge , 2018, Proceedings of the IEEE.
[61] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[62] Thomas Hofmann,et al. Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.
[63] Aryan Mokhtari,et al. Robust and Communication-Efficient Collaborative Learning , 2019, NeurIPS.