暂无分享,去创建一个
Meghan Cowan | Todd Mytkowicz | Vijay Chidambaram | Olli Saarikivi | Rachee Singh | Saeed Maleki | Aashaka Shah | Madan Musuvathi | Jacob Nelson
[1] Roger W. Hockney,et al. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.
[2] Jack J. Dongarra,et al. Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[3] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..
[4] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[5] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[6] Zhengyang Liu,et al. Synthesizing Optimal Collective Algorithms , 2020, ArXiv.
[7] D. S. Scott,et al. Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.
[8] Luis Ceze,et al. PLink: Discovering and Exploiting Locality for Accelerated Distributed Training on the public Cloud , 2020, MLSys.
[9] Nikhil R. Devanur,et al. Blink: Fast and Generic Collectives for Distributed ML , 2019, MLSys.
[10] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[11] Robert A. van de Geijn,et al. Global combine on mesh architectures with wormhole routing , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.
[12] Shahid H. Bokhari,et al. Complete exchange on a circuit switched mesh , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..
[13] Keijo Heljanko,et al. Improving Dynamic Partial Order Reductions for Concolic Testing , 2012, 2012 12th International Conference on Application of Concurrency to System Design.
[14] Minsik Cho,et al. BlueConnect: Decomposing all-reduce for deep learning on heterogeneous network hierarchy , 2019, IBM J. Res. Dev..
[15] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[16] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[17] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.