A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks
暂无分享,去创建一个
Shaohuai Shi | Zhenheng Tang | Yuxin Wang | Xiaowen Chu | Qiang Wang | Kaiyong Zhao | Xiang Huang | S. Shi | Xiaowen Chu | Kaiyong Zhao | Yuxin Wang | Zhenheng Tang | Qiang Wang | Xiang Huang
[1] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[2] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[4] Jack J. Dongarra,et al. Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[5] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[6] Samuel Madden,et al. Efficient Top-K Query Processing on Massively Parallel Hardware , 2018, SIGMOD Conference.
[7] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[8] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[9] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[10] Klaus-Robert Müller,et al. Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).
[11] Junzhou Huang,et al. Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.
[12] Torsten Hoefler,et al. Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues , 2010, EuroMPI.
[13] Xinxin Mei,et al. Benchmarking the Memory Hierarchy of Modern GPUs , 2014, NPC.
[14] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[15] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[16] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[17] Wei Zhang,et al. AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training , 2017, AAAI.
[18] Guangwen Yang,et al. RedSync : Reducing Synchronization Traffic for Distributed Deep Learning , 2018, J. Parallel Distributed Comput..
[19] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[20] Shaohuai Shi,et al. MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms , 2018, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.
[21] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[22] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[23] Nathan Srebro,et al. Stochastic Nonconvex Optimization with Large Minibatches , 2019, ALT.
[24] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..
[25] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..
[26] Shaohuai Shi,et al. Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs , 2017, 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech).
[27] Peng Jiang,et al. A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication , 2018, NeurIPS.
[28] Julian Kates-Harbeck,et al. Training distributed deep recurrent neural networks with mixed precision on GPU clusters , 2017, MLHPC@SC.
[29] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[30] Bo Li,et al. A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).
[31] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[32] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[33] Pradeep Dubey,et al. Distributed Deep Learning Using Synchronous Stochastic Gradient Descent , 2016, ArXiv.
[34] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[35] Robert Hecht-Nielsen,et al. Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.
[36] Richard G. Baraniuk,et al. Connection-level analysis and modeling of network traffic , 2001, IMW '01.
[37] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[38] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[39] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.
[40] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[41] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[42] Dan Alistarh,et al. SparCML: high-performance sparse communication for machine learning , 2018, SC.
[43] Sam Ade Jacobs,et al. Communication Quantization for Data-Parallel Training of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).
[44] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[45] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.