暂无分享,去创建一个
[1] Olatunji Ruwase,et al. ZeRO: Memory Optimization Towards Training A Trillion Parameter Models , 2019, SC.
[2] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..
[3] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[4] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[5] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.
[6] Martin Jaggi,et al. Decentralized Deep Learning with Arbitrary Communication Compression , 2019, ICLR.
[7] David J. Schwab,et al. The Early Phase of Neural Network Training , 2020, ICLR.
[8] Nam Sung Kim,et al. GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training , 2018, NeurIPS.
[9] Christopher De Sa,et al. Moniqua: Modulo Quantized Communication in Decentralized SGD , 2020, ICML.
[10] Martin Jaggi,et al. PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization , 2019, NeurIPS.
[11] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[12] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[13] Martin Jaggi,et al. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.
[14] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[15] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[16] Stephen P. Boyd,et al. Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).
[17] Martin Jaggi,et al. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.
[18] Mehryar Mohri,et al. SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.
[19] Michael G. Rabbat,et al. Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.
[20] Ahmed M. Abdelmoniem,et al. Compressed Communication for Distributed Deep Learning: Survey and Quantitative Evaluation , 2020 .
[21] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[22] Vladimir Braverman,et al. Communication-efficient distributed SGD with Sketching , 2019, NeurIPS.
[23] Sebastian U. Stich,et al. The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication , 2019, ArXiv.
[24] Angelia Nedic,et al. Distributed Gradient Methods for Convex Machine Learning Problems in Networks: Distributed Optimization , 2020, IEEE Signal Processing Magazine.
[25] Ji Liu,et al. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.
[26] Hanlin Tang,et al. Communication Compression for Decentralized Training , 2018, NeurIPS.
[27] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[28] Dimitris S. Papailiopoulos,et al. ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.
[29] John N. Tsitsiklis,et al. Problems in decentralized decision making and computation , 1984 .
[30] Angelia Nedic,et al. Distributed stochastic gradient tracking methods , 2018, Mathematical Programming.
[31] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[32] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[33] Kamyar Azizzadenesheli,et al. signSGD with Majority Vote is Communication Efficient and Fault Tolerant , 2018, ICLR.