Prophet: Speeding up Distributed DNN Training with Predictable Communication Scheduling
暂无分享,去创建一个
Qiang Qi | Fei Xu | Li Chen | Zhenwei Zhang | Ruitao Shang | Fei Xu | Qiang Qi | Li Chen | Ruitao Shang | Zhenwei Zhang
[1] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[2] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.
[3] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[4] Panos Kalnis,et al. Scaling Distributed Machine Learning with In-Network Aggregation , 2019, NSDI.
[5] Aditya Akella,et al. Network-accelerated distributed machine learning for multi-tenant settings , 2020, SoCC.
[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[7] Michael M. Swift,et al. ATP: In-network Aggregation for Multi-tenant Learning , 2021, NSDI.
[8] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[9] Yanjun Ma,et al. PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice , 2019 .
[10] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[11] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[12] Aijun An,et al. Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).
[13] Bo Li,et al. Round-Robin Synchronization: Mitigating Communication Bottlenecks in Parameter Servers , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.
[14] Seunghak Lee,et al. On Model Parallelization and Scheduling Strategies for Distributed Machine Learning , 2014, NIPS.
[15] Shaohuai Shi,et al. Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs , 2020, ArXiv.
[16] Yibo Zhu,et al. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters , 2020, OSDI.
[17] Shaohuai Shi,et al. MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms , 2018, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.
[18] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[19] Sangeetha Abdu Jyothi,et al. TicTac: Accelerating Distributed Deep Learning with Communication Scheduling , 2018, MLSys.
[20] Janis Keuper,et al. Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).
[21] Gennady Pekhimenko,et al. Priority-based Parameter Propagation for Distributed DNN Training , 2019, SysML.
[22] Chuan Wu,et al. Preemptive All-reduce Scheduling for Expediting Distributed DNN Training , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.
[23] Wencong Xiao,et al. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads , 2019, USENIX Annual Technical Conference.
[24] Amar Phanishayee,et al. Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training , 2018, SoCC.
[25] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[26] Shuai Wang,et al. Geryon: Accelerating Distributed CNN Training by Network-Level Flow Scheduling , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.
[27] David Patterson,et al. MLPerf Training Benchmark , 2019, MLSys.
[28] Xiaobo Zhou,et al. Scalable Distributed DL Training: Batching Communication and Computation , 2019, AAAI.
[29] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.