暂无分享,去创建一个
Wotao Yin | Tianyi Chen | Yuejiao Sun | W. Yin | Tianyi Chen | Yuejiao Sun
[1] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[2] Junzhou Huang,et al. Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.
[3] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[4] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[5] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[6] Georgios B. Giannakis,et al. Sensor-Centric Data Reduction for Estimation With WSNs via Censoring and Quantization , 2012, IEEE Transactions on Signal Processing.
[7] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[8] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[9] Michael G. Rabbat,et al. Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.
[10] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[11] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[12] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[13] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[14] Georgios B. Giannakis,et al. LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning , 2018, NeurIPS.
[15] Na Li,et al. On Maintaining Linear Convergence of Distributed Learning and Optimization under Limited Communication , 2019, 2019 53rd Asilomar Conference on Signals, Systems, and Computers.
[16] Yun Yang,et al. Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.
[17] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[18] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.
[19] Georgios B. Giannakis,et al. Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients , 2019, NeurIPS.
[20] Shenghuo Zhu,et al. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.
[21] Bruce S. Davie,et al. Computer Networks: A Systems Approach , 1996 .
[22] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[23] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[24] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[25] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[26] Dimitris S. Papailiopoulos,et al. ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.
[27] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[28] Jianyu Wang,et al. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.
[29] Qing Ling,et al. Communication-Censored Distributed Stochastic Gradient Descent , 2019, ArXiv.
[30] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[31] Rong Jin,et al. On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization , 2019, ICML.
[32] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.