Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement
暂无分享,去创建一个
Amir H. Payberah | Vladimir Vlassov | Desta Haileselassie Hagos | Tianze Wang | D. Hagos | Vladimir Vlassov | Tianze Wang | A. H. Payberah
[1] Quoc V. Le,et al. A Hierarchical Model for Device Placement , 2018, ICLR.
[2] Azalia Mirhoseini,et al. GDP: Generalized Device Placement for Dataflow Graphs , 2019, ArXiv.
[3] Lorenzo Bruzzone,et al. From Copernicus Big Data to Extreme Earth Analytics , 2019, EDBT.
[4] Jure Leskovec,et al. Inductive Representation Learning on Large Graphs , 2017, NIPS.
[5] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.
[6] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Azalia Mirhoseini,et al. A Single-Shot Generalized Device Placement for Large Dataflow Graphs , 2020, IEEE Micro.
[8] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.
[9] Lorenzo Bruzzone,et al. Monitoring of agricultural areas by using Sentinel 2 image time series and deep learning techniques , 2020, Remote Sensing.
[10] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[11] Xiao Xiang Zhu,et al. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources , 2017, IEEE Geoscience and Remote Sensing Magazine.
[12] Quoc V. Le,et al. A graph placement methodology for fast chip design , 2021, Nature.
[13] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[14] Dominique Beaini,et al. Rethinking Graph Transformers with Spectral Attention , 2021, NeurIPS.
[15] Ruben Mayer,et al. The tensorflow partitioning and scheduling problem: it's the critical path! , 2017, ArXiv.
[16] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[17] Amir H. Payberah,et al. AutoAblation: Automated Parallel Ablation Studies for Deep Learning , 2021, EuroMLSys@EuroSys.
[18] Hongzi Mao,et al. Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning , 2019, NeurIPS.
[19] Aric Hagberg,et al. Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.
[20] Di He,et al. Do Transformers Really Perform Bad for Graph Representation? , 2021, ArXiv.
[21] Michael M. Bronstein,et al. Understanding over-squashing and bottlenecks on graphs via curvature , 2021, ArXiv.
[22] Baochun Li,et al. Spotlight: Optimizing Device Placement for Training Deep Neural Networks , 2018, ICML.
[23] Jure Leskovec,et al. GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models , 2018, ICML.
[24] Torbjørn Eltoft,et al. Sea Ice Classification of SAR Imagery Based on Convolution Neural Networks , 2021, Remote. Sens..
[25] Lorenzo Bruzzone,et al. ExtremeEarth Meets Satellite Data From Space , 2021, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
[26] A. B. Kahn,et al. Topological sorting of large networks , 1962, CACM.
[27] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[28] Baochun Li,et al. Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization , 2018, NeurIPS.
[29] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[30] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[31] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.
[32] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[33] Moritz Meister,et al. Maggy: Scalable Asynchronous Parallel Hyperparameter Search , 2020, DistributedML@CoNEXT.
[34] Amir H. Payberah,et al. Graph Representation Matters in Device Placement , 2020, DIDL@Middleware.