Iteration number-based hierarchical gradient aggregation for distributed deep learning
暂无分享,去创建一个
[1] Chan-Hyun Youn,et al. BOA: batch orchestration algorithm for straggler mitigation of distributed DL training in heterogeneous GPU cluster , 2019, The Journal of Supercomputing.
[2] Jaesik Choi,et al. HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism , 2020, USENIX ATC.
[3] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[4] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[5] Andreas Stolcke,et al. The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.
[7] James Demmel,et al. Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs , 2019, IEEE Transactions on Parallel and Distributed Systems.
[8] Weigang Wu,et al. Dual-Way Gradient Sparsification for Asynchronous Distributed Deep Learning , 2020, ICPP.
[9] Gangzhao Lu,et al. Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading , 2020 .
[10] Jiangtao Ren,et al. Standard Deviation Based Adaptive Gradient Compression For Distributed Deep Learning , 2020, 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID).
[11] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.
[12] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[13] James T. Kwok,et al. Asynchronous Distributed ADMM for Consensus Optimization , 2014, ICML.
[14] Ji Liu,et al. Staleness-Aware Async-SGD for Distributed Deep Learning , 2015, IJCAI.
[15] Dynamic parameter allocation in parameter servers , 2020, Proc. VLDB Endow..
[16] Martin Elsman,et al. Software for Incremental Flattening for Nested Data Parallelism , 2019 .
[17] Nong Xiao,et al. Model Parallelism Optimization for Distributed Inference Via Decoupled CNN Structure , 2021, IEEE Transactions on Parallel and Distributed Systems.
[18] Wei Zhang,et al. AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training , 2017, AAAI.
[19] Yanbo Xue,et al. Distributed Training of Deep Learning Models: A Taxonomic Perspective , 2020, IEEE Transactions on Parallel and Distributed Systems.
[20] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[21] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[22] Alexander Tiskin,et al. BSP (Bulk Synchronous Parallelism) , 2011, Encyclopedia of Parallel Computing.
[23] Dan Alistarh,et al. QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent , 2016, ArXiv.
[24] Alexander J. Smola,et al. An architecture for parallel topic models , 2010, Proc. VLDB Endow..
[25] Kannan Ramchandran,et al. Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism , 2020, KDD.
[26] Adrián Castelló,et al. PyDTNN: A user-friendly and extensible framework for distributed deep learning , 2021, The Journal of Supercomputing.
[27] Alexander J. Smola,et al. Scalable inference in latent variable models , 2012, WSDM '12.
[28] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[29] Elad Hoffer,et al. Scalable Methods for 8-bit Training of Neural Networks , 2018, NeurIPS.
[30] Ji Liu,et al. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.
[31] Martin Elsman,et al. Incremental flattening for nested data parallelism , 2019, PPoPP.
[32] Hamid Reza Feyzmahdavian,et al. An asynchronous mini-batch algorithm for regularized stochastic optimization , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).
[33] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[34] Weigang Wu,et al. EGC: Entropy-based gradient compression for distributed deep learning , 2021, Inf. Sci..
[35] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[36] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.