CEFS: compute-efficient flow scheduling for iterative synchronous applications
暂无分享,去创建一个
[1] Alex X. Liu,et al. Friends, not Foes – Synthesizing Existing Transport Strategies for Data Center Networks , 2014 .
[2] Nick McKeown,et al. pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.
[3] Shuai Wang,et al. Geryon: Accelerating Distributed CNN Training by Network-Level Flow Scheduling , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.
[4] Joshua Romero,et al. Exascale Deep Learning for Scientific Inverse Problems , 2019, ArXiv.
[5] Yiming Zhang,et al. Rate-aware flow scheduling for commodity data center networks , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.
[6] Kai Chen,et al. Towards Zero Copy Dataflows using RDMA , 2017, SIGCOMM Posters and Demos.
[7] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[8] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[9] Ion Stoica,et al. Coflow: a networking abstraction for cluster applications , 2012, HotNets-XI.
[10] Amar Phanishayee,et al. Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training , 2018, SoCC.
[11] Shuai Wang,et al. HiPS: Hierarchical Parameter Synchronization in Large-Scale Distributed Machine Learning , 2018, NetAI@SIGCOMM.
[12] Ion Stoica,et al. Efficient coflow scheduling with Varys , 2014, SIGCOMM.
[13] Tao Zhang,et al. EFLOPS: Algorithm and System Co-Design for a High Performance Distributed Training Platform , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[14] Panos Kalnis,et al. Scaling Distributed Machine Learning with In-Network Aggregation , 2019, NSDI.
[15] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[16] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.
[17] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.
[18] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[19] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[20] Dhabaleswar K. Panda,et al. Accelerating TensorFlow with Adaptive RDMA-Based gRPC , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).
[21] Panos Kalnis,et al. In-Network Computation is a Dumb Idea Whose Time Has Come , 2017, HotNets.
[22] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[23] Chuan Wu,et al. Optimus: an efficient dynamic resource scheduler for deep learning clusters , 2018, EuroSys.
[24] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[25] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[26] Antony I. T. Rowstron,et al. Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.
[27] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[28] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[29] Wencong Xiao,et al. Gandiva: Introspective Cluster Scheduling for Deep Learning , 2018, OSDI.
[30] Sangeetha Abdu Jyothi,et al. TicTac: Accelerating Distributed Deep Learning with Communication Scheduling , 2018, MLSys.
[31] Wencong Xiao,et al. Multi-tenant GPU Clusters for Deep Learning Workloads: Analysis and Implications , 2018 .
[32] Dan Li,et al. Impact of Network Topology on the Performance of DML: Theoretical Analysis and Practical Factors , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.
[33] Hiroaki Mikami,et al. Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash , 2018 .
[34] Gennady Pekhimenko,et al. Priority-based Parameter Propagation for Distributed DNN Training , 2019, SysML.
[35] Ion Stoica,et al. Efficient Coflow Scheduling Without Prior Knowledge , 2015, SIGCOMM.
[36] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[37] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[38] Hai Jin,et al. Heterogeneity and Interference-Aware Virtual Machine Provisioning for Predictable Performance in the Cloud , 2016, IEEE Transactions on Computers.
[39] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[40] StoicaIon,et al. Efficient Coflow Scheduling Without Prior Knowledge , 2015 .
[41] Jiawei Jiang,et al. Heterogeneity-aware Distributed Parameter Servers , 2017, SIGMOD Conference.
[42] Joseph Gonzalez,et al. On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent , 2018, ArXiv.
[43] Yongqiang Xiong,et al. Congestion Control for High-speed Extremely Shallow-buffered Datacenter Networks , 2017, APNet.
[44] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[46] Wei Bai,et al. Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.
[47] Bo Li,et al. Fast Distributed Deep Learning via Worker-adaptive Batch Sizing , 2018, SoCC.
[48] Shaohuai Shi,et al. MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms , 2018, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.
[49] Kang G. Shin,et al. Tiresias: A GPU Cluster Manager for Distributed Deep Learning , 2019, NSDI.
[50] Haitao Wu,et al. RDMA over Commodity Ethernet at Scale , 2016, SIGCOMM.
[51] Feng Liu,et al. AuTO: scaling deep reinforcement learning for datacenter-scale automatic traffic optimization , 2018, SIGCOMM.
[52] Michael J. Freedman,et al. SLAQ: quality-driven scheduling for distributed machine learning , 2017, SoCC.
[53] Jonas Mockus,et al. Application of Bayesian approach to numerical methods of global and stochastic optimization , 1994, J. Glob. Optim..
[54] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[55] Krishnendu Chakrabarty,et al. Lotus: A New Topology for Large-scale Distributed Machine Learning , 2020, ACM J. Emerg. Technol. Comput. Syst..