MGG: Accelerating Graph Neural Networks with Fine-Grained Intra-Kernel Communication-Computation Pipelining on Multi-GPU Platforms
暂无分享,去创建一个
Yuke Wang | Tong Geng | Zheng Wang | Ang Li | Boyuan Feng | Yufei Ding | K. Barker
[1] Yunru Bai,et al. PiPAD: Pipelined and Parallel Dynamic GNN Training on GPUs , 2023, PPoPP.
[2] D. Mudigere,et al. EL-Rec: Efficient Large-Scale Recommendation Model Training via Tensor-Train Embedding Table , 2022, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Jingren Zhou,et al. GNNLab: a factored system for sample-based GNN training over GPUs , 2022, EuroSys.
[4] Ping Luo,et al. vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training , 2022, IEEE Transactions on Parallel and Distributed Systems.
[5] Katherine A. Yelick,et al. CloudBank: Managed Services to Simplify Cloud Access for Computer Science Research and Education , 2021, PEARC.
[6] James Cheng,et al. DGCL: an efficient communication library for distributed GNN training , 2021, EuroSys.
[7] Wen-mei W. Hwu,et al. Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture , 2021, Proc. VLDB Endow..
[8] Jinjun Xiong,et al. PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses , 2021, ArXiv.
[9] Yunxin Liu,et al. PaGraph: Scaling GNN training on large graphs via computation-aware caching , 2020, SoCC.
[10] D. Narayanan,et al. Memory-Efficient Pipeline-Parallel DNN Training , 2020, ICML.
[11] Lei Deng,et al. GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs , 2020, ArXiv.
[12] J. Leskovec,et al. Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.
[13] Yufeng Zhang,et al. Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks , 2020, ACL.
[14] Alexander Aiken,et al. Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc , 2020, MLSys.
[15] Ramyad Hadidi,et al. Batch-Aware Unified Memory Management in GPUs for Irregular Workloads , 2020, ASPLOS.
[16] Dongrui Fan,et al. HyGCN: A GCN Accelerator with Hybrid Architecture , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[17] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[18] Alex Smola,et al. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.
[19] Yafei Dai,et al. NeuGraph: Parallel Deep Neural Network Computation on Large Graphs , 2019, USENIX ATC.
[20] Carole-Jean Wu,et al. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[21] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[22] Xu Liu,et al. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect , 2019, IEEE Transactions on Parallel and Distributed Systems.
[23] Jure Leskovec,et al. How Powerful are Graph Neural Networks? , 2018, ICLR.
[24] Jure Leskovec,et al. Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.
[25] Cao Xiao,et al. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.
[26] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.
[27] Mathias Niepert,et al. Learning Graph Representations with Embedding Propagation , 2017, NIPS.
[28] Lina Yao,et al. Deep Learning Based Recommender System , 2017, ACM Comput. Surv..
[29] Jure Leskovec,et al. Inductive Representation Learning on Large Graphs , 2017, NIPS.
[30] Wenguang Chen,et al. Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.
[31] Weimin Zheng,et al. Exploring the Hidden Dimension in Graph Processing , 2016, OSDI.
[32] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[33] Jure Leskovec,et al. node2vec: Scalable Feature Learning for Networks , 2016, KDD.
[34] Makoto Onizuka,et al. Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[35] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .
[36] Steven Skiena,et al. DeepWalk: online learning of social representations , 2014, KDD.
[37] Steven Derrien,et al. Runtime dependency analysis for loop pipelining in High-Level Synthesis , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[38] Pedro F. Miret,et al. Wikipedia , 2008, Monatsschrift für Deutsches Recht.
[39] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[40] Ernest Valveny,et al. Graph embedding in vector spaces by node attribute statistics , 2012, Pattern Recognit..
[41] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[42] Kaspar Riesen,et al. Graph Classification and Clustering Based on Vector Space Embedding , 2010, Series in Machine Perception and Artificial Intelligence.
[43] Srikanta J. Bedathur,et al. Towards time-aware link prediction in evolving social networks , 2009, SNA-KDD '09.
[44] Jérôme Kunegis,et al. Learning spectral graph transformations for link prediction , 2009, ICML '09.
[45] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[46] Hsinchun Chen,et al. Link prediction approach to collaborative filtering , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).
[47] Anand Padmanabha Iyer,et al. P3: Distributed Deep Graph Learning at Scale , 2021, OSDI.
[48] Algorithms, Theory , 2006 .