论文信息 - Boosting Graph Embedding on a Single GPU - 字舞流文

Boosting Graph Embedding on a Single GPU

Graphs are ubiquitous, and they can model unique characteristics and complex relations of real-life systems. Although using machine learning (ML) on graphs is promising, their raw representation is not suitable for ML algorithms. Graph embedding represents each node of a graph as a d-dimensional vector which is more suitable for ML tasks. However, the embedding process is expensive, and CPU-based tools do not scale to real-world graphs. In this work, we present GOSH, a GPU-based tool for embedding large-scale graphs with minimum hardware constraints. GOSH employs a novel graph coarsening algorithm to enhance the impact of updates and minimize the work for embedding. It also incorporates a decomposition schema that enables any arbitrarily large graph to be embedded with a single GPU. As a result, GOSH sets a new state-of-the-art in link prediction both in accuracy and speed, and delivers high-quality embeddings for node classification at a fraction of the time compared to the state-of-the-art. For instance, it can embed a graph with over 65 million vertices and 1.8 billion edges in less than 30 minutes on a single GPU.

Kamer Kaya | Taha Atahan Akyildiz | Amro Alabsi Aljundi | K. Kaya

[1] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[2] Kamer Kaya,et al. Understanding Coarsening for Embedding Large-Scale Graphs , 2020, 2020 IEEE International Conference on Big Data (Big Data).

[3] Emmanuel Müller,et al. VERSE: Versatile Graph Embeddings from Similarity Measures , 2018, WWW.

[4] Xiaokui Xiao,et al. Homogeneous network embedding for massive graphs via reweighted personalized PageRank , 2019, Proc. VLDB Endow..

[5] Qiongkai Xu,et al. GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[6] Ryan A. Rossi,et al. The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[7] Wei Lu,et al. Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[8] Mikhail Belkin,et al. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[9] Srinivasan Parthasarathy,et al. MILE: A Multi-Level Framework for Scalable Graph Embedding , 2018, ICWSM.

[10] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[11] Sebastiano Vigna,et al. The Graph Structure in the Web - Analyzed on Different Aggregation Levels , 2015, J. Web Sci..

[12] Sivasankaran Rajamanickam,et al. Performance-Portable Graph Coarsening for Efficient Multilevel Graph Analysis , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[13] Jian Tang,et al. GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding , 2019, WWW.

[14] Steven Skiena,et al. DeepWalk: online learning of social representations , 2014, KDD.

[15] Erik Ordentlich,et al. Network-Efficient Distributed Word2vec Training System for Large Vocabularies , 2016, CIKM.

[16] Krishna P. Gummadi,et al. Measurement and analysis of online social networks , 2007, IMC '07.

[17] Jian Pei,et al. Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[18] Steven Skiena,et al. HARP: Hierarchical Representation Learning for Networks , 2017, AAAI.

[19] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[20] G. Karypis,et al. DGL-KE: Training Knowledge Graph Embeddings at Scale , 2020, SIGIR.

[21] Yan Wang,et al. ProNE: Fast and Scalable Network Representation Learning , 2019, IJCAI.

[22] Wenwu Zhu,et al. Structural Deep Network Embedding , 2016, KDD.

[23] Jure Leskovec,et al. node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[24] Palash Goyal,et al. Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[25] Alexander Peysakhovich,et al. PyTorch-BigGraph: A Large-scale Graph Embedding System , 2019, SysML.

[26] Noam Shazeer,et al. Swivel: Improving Embeddings by Noticing What's Missing , 2016, ArXiv.

[27] Tom Fawcett,et al. An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[28] Mingzhe Wang,et al. LINE: Large-scale Information Network Embedding , 2015, WWW.

[29] Dongxu Yang,et al. EDGES: An Efficient Distributed Graph Embedding System on GPU Clusters , 2021, IEEE Transactions on Parallel and Distributed Systems.

[30] Jian Li,et al. NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization , 2019, WWW.