AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning
暂无分享,去创建一个
Hao Zhang | Eric P. Xing | Lawrence Carin | Zhijie Deng | Xiaodan Liang | Yuan Li | Hao Zhang | L. Carin | E. Xing | Xiaodan Liang | Yuan Li | Hao Zhang | Zhijie Deng
[1] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.
[2] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[3] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[4] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[5] Hao Ma,et al. GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs , 2018, UAI.
[6] Alexander Aiken,et al. Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.
[7] Kalyanmoy Deb,et al. A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..
[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[10] Quoc V. Le,et al. A Hierarchical Model for Device Placement , 2018, ICLR.
[11] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[13] Charles R. Qi,et al. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks , 2018, ICML.
[14] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[15] Yaoliang Yu,et al. Orpheus: Efficient Distributed Machine Learning via System and Algorithm Co-design , 2018, SoCC.
[16] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[17] Eric P. Xing,et al. Managed communication and consistency for fast data-parallel iterative analytics , 2015, SoCC.
[18] Yaoliang Yu,et al. Distributed Machine Learning via Sufficient Factor Broadcasting , 2015, ArXiv.
[19] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[20] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[21] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.
[22] Thierry Moreau,et al. Learning to Optimize Tensor Programs , 2018, NeurIPS.
[23] Nikhil R. Devanur,et al. PipeDream: Fast and Efficient Pipeline Parallel DNN Training , 2018, ArXiv.
[24] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.
[25] Peter I. Frazier,et al. A Tutorial on Bayesian Optimization , 2018, ArXiv.
[26] Tat-Seng Chua,et al. Neural Collaborative Filtering , 2017, WWW.
[27] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[28] Minjie Wang,et al. Supporting Very Large Models using Automatic Dataflow Graph Partitioning , 2018, EuroSys.
[29] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[30] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.
[31] Eric P. Xing,et al. AutoLoss: Learning Discrete Schedules for Alternate Optimization , 2018, ICLR 2018.
[32] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[33] Byung-Gon Chun,et al. Parallax: Automatic Data-Parallel Training of Deep Neural Networks , 2018, ArXiv.
[34] Tie-Yan Liu,et al. Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.
[35] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[36] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.