Green, Yellow, Yield: End-Host Traffic Scheduling for Distributed Deep Learning with TensorLights
暂无分享,去创建一个
[1] Sangeetha Abdu Jyothi,et al. TicTac: Accelerating Distributed Deep Learning with Communication Scheduling , 2018, MLSys.
[2] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Amin Vahdat,et al. Sincronia: near-optimal network design for coflows , 2018, SIGCOMM.
[5] Gennady Pekhimenko,et al. Priority-based Parameter Propagation for Distributed DNN Training , 2019, SysML.
[6] Yanhui Geng,et al. CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark , 2016, SIGCOMM.
[7] Chuan Wu,et al. Optimus: an efficient dynamic resource scheduler for deep learning clusters , 2018, EuroSys.
[8] Ion Stoica,et al. Efficient Coflow Scheduling Without Prior Knowledge , 2015, SIGCOMM.
[9] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[10] T. S. Eugene Ng,et al. The Impact of Virtualization on Network Performance of Amazon EC2 Data Center , 2010, 2010 Proceedings IEEE INFOCOM.
[11] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[12] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[13] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[14] Ion Stoica,et al. Coflow: a networking abstraction for cluster applications , 2012, HotNets-XI.
[15] Amar Phanishayee,et al. Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training , 2018, SoCC.
[16] Randy H. Katz,et al. Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.
[17] Dafna Shahaf,et al. Learning to Route , 2017, HotNets.
[18] Pongsakorn U.-Chupala,et al. ImageNet/ResNet-50 Training in 224 Seconds , 2018, ArXiv.
[19] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[20] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[21] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[22] Tim Kraska,et al. The Case for Learned Index Structures , 2018 .
[23] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[24] Tamiya Onodera,et al. Workload characterization and optimization of TPC-H queries on Apache Spark , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[25] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.
[26] Min Zhu,et al. B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.
[27] Amin Vahdat,et al. BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing , 2015, Comput. Commun. Rev..
[28] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[29] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.
[30] Martín Abadi,et al. Learning to Protect Communications with Adversarial Neural Cryptography , 2016, ArXiv.
[31] Ion Stoica,et al. Efficient coflow scheduling with Varys , 2015, SIGCOMM.
[32] Michael I. Jordan,et al. Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.
[33] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[34] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.
[35] Forrest N. Iandola,et al. FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).