暂无分享,去创建一个
Norman A. Rink | Michael Schaarschmidt | Dimitrios Vytiniotis | Adam Paszke | Dominik Grewe | Dan Belov | Tamara Norman | Georg Stefan Schmid | James Molloy | Jonathan Godwin | Norman Alexander Rink | Vinod Nair | James Molloy | Vinod Nair | Adam Paszke | Dominik Grewe | Dan Belov | Dimitrios Vytiniotis | Michael Schaarschmidt | Tamara Norman | Jonathan Godwin | G. Schmid
[1] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[2] Minjie Wang,et al. Supporting Very Large Models using Automatic Dataflow Graph Partitioning , 2018, EuroSys.
[3] Mangpo Phitchaya Phothilimtha,et al. Transferable Graph Optimizers for ML Compilers , 2020, NeurIPS.
[4] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[5] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[6] Christopher De Sa,et al. PipeMare: Asynchronous Pipeline Parallel DNN Training , 2019, ArXiv.
[7] Razvan Pascanu,et al. Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.
[8] Noam Shazeer,et al. GSPMD: General and Scalable Parallelization for ML Computation Graphs , 2021, ArXiv.
[9] Olatunji Ruwase,et al. ZeRO-Offload: Democratizing Billion-Scale Model Training , 2021, USENIX ATC.
[10] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .
[11] Alexander Aiken,et al. Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.
[12] Ryota Tomioka,et al. DistIR: An Intermediate Representation for Optimizing Distributed Neural Networks , 2021, EuroMLSys@EuroSys.
[13] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[14] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[15] Max Willsey,et al. Equality Saturation for Tensor Graph Superoptimization , 2021, ArXiv.
[16] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[17] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[18] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[19] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.
[20] Amar Phanishayee,et al. Efficient Large-Scale Language Model Training on GPU Clusters , 2021, ArXiv.
[21] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[22] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[23] Uday Bondhugula,et al. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation , 2021, 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[24] Alexander Aiken,et al. Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc , 2020, MLSys.
[25] Quoc V. Le,et al. A Hierarchical Model for Device Placement , 2018, ICLR.
[26] Dimitrios Vytiniotis,et al. Declarative abstractions for tensor program partitioning , 2020, PPDP.
[27] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[28] Amar Phanishayee,et al. Memory-Efficient Pipeline-Parallel DNN Training , 2021, ICML.