暂无分享,去创建一个
[1] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[2] Yaoliang Yu,et al. Petuum: A New Platform for Distributed Machine Learning on Big Data , 2015, IEEE Trans. Big Data.
[3] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[5] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[6] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[7] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[8] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[9] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[10] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.
[11] Jean-Baptiste Cordonnier,et al. Convex Optimization using Sparsified Stochastic Gradient Descent with Memory , 2018 .
[12] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[13] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[14] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[15] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[16] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[17] Junzhou Huang,et al. Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.
[18] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[19] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[20] Ji Liu,et al. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.
[21] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[22] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[23] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[24] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.
[25] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[26] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[27] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[28] Kamyar Azizzadenesheli,et al. signSGD with Majority Vote is Communication Efficient and Fault Tolerant , 2018, ICLR.