A Single-Shot Generalized Device Placement for Large Dataflow Graphs
暂无分享,去创建一个
Azalia Mirhoseini | James Laudon | Yanqi Zhou | Amirali Abdolrashidi | Qiumin Xu | Daniel Lin-Kit Wong | Sudip Roy | Peter Ma
[1] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[2] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.
[3] Hongzi Mao,et al. Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning , 2019, NeurIPS.
[4] Bruno A. Olshausen,et al. Superposition of many models into one , 2019, NeurIPS.
[5] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[6] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[7] Vinod Nair,et al. REGAL: Transfer Learning For Fast Optimization of Computation Graphs , 2019, ArXiv.
[8] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[9] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.
[10] Alexander Aiken,et al. Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.
[11] Yang Yang,et al. Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.
[12] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[13] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[14] Baochun Li,et al. Spotlight: Optimizing Device Placement for Training Deep Neural Networks , 2018, ICML.
[15] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[16] Jure Leskovec,et al. Inductive Representation Learning on Large Graphs , 2017, NIPS.
[17] Jure Leskovec,et al. How Powerful are Graph Neural Networks? , 2018, ICLR.
[18] Quoc V. Le,et al. A Hierarchical Model for Device Placement , 2018, ICLR.