Combining Global Sparse Gradients with Local Gradients in Distributed Neural Network Training
暂无分享,去创建一个
[1] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[2] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[3] André F. T. Martins,et al. Marian: Fast Neural Machine Translation in C++ , 2018, ACL.
[4] Parijat Dube,et al. Slow and Stale Gradients Can Win the Race , 2018, IEEE Journal on Selected Areas in Information Theory.
[5] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[6] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[7] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[8] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[9] Marcin Junczys-Dowmunt,et al. Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation , 2018, EMNLP.
[10] Dan Alistarh,et al. QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent , 2016, ArXiv.
[11] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.
[12] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[13] Kenneth Heafield,et al. Making Asynchronous Stochastic Gradient Descent Work for Transformers , 2019, NGT@EMNLP-IJCNLP.
[14] Sam Ade Jacobs,et al. Communication Quantization for Data-Parallel Training of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).
[15] Rico Sennrich,et al. The University of Edinburgh’s Neural MT Systems for WMT17 , 2017, WMT.
[16] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[19] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[20] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[21] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[22] Rico Sennrich,et al. Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.
[23] Matthew J. Streeter,et al. Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning , 2014, NIPS.
[24] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.