Priority-based parameter propagation for distributed deep neural network training
暂无分享,去创建一个
[1] Wei Zhang,et al. AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training , 2017, AAAI.
[2] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[4] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[5] Amar Phanishayee,et al. TBD: Benchmarking and Analyzing Deep Neural Network Training , 2018, ArXiv.
[6] Shaohuai Shi,et al. Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs , 2017, 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech).
[7] David Cunningham,et al. Gigabit Ethernet Networking , 1999 .
[8] 정혜동,et al. InfiniBand 연결망 기반 데이터 전송 시 상위 응용에 따른 최적 패킷 크기에 관한 연구 , 2015 .
[9] Abhinav Vishnu,et al. GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent , 2018, ArXiv.
[10] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[11] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[12] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[13] Dhabaleswar K. Panda,et al. Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? , 2017, EuroMPI.
[14] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[15] Jing Li,et al. DNN Model Compression Under Accuracy Constraints , 2018 .
[16] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[17] Amar Phanishayee,et al. Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training , 2018, SoCC.
[18] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[19] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[20] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[21] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[24] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[25] Martin Burtscher,et al. FPC: A High-Speed Compressor for Double-Precision Floating-Point Data , 2009, IEEE Transactions on Computers.
[26] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[27] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[28] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[29] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[30] Sangeetha Abdu Jyothi,et al. Communication Scheduling as a First-Class Citizen in Distributed Machine Learning Systems , 2018, ArXiv.
[31] Domenic Bottini,et al. Network Evolution for DNNs , 2018 .
[32] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[33] Matt Post,et al. We start by defining the recurrent architecture as implemented in S OCKEYE , following , 2018 .
[34] Janis Keuper,et al. Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).