Boosting Graph Embedding on a Single GPU

Graphs are ubiquitous, and they can model unique characteristics and complex relations of real-life systems. Although using machine learning (ML) on graphs is promising, their raw representation is not suitable for ML algorithms. Graph embedding represents each node of a graph as a d-dimensional vector which is more suitable for ML tasks. However, the embedding process is expensive, and CPU-based tools do not scale to real-world graphs. In this work, we present GOSH, a GPU-based tool for embedding large-scale graphs with minimum hardware constraints. GOSH employs a novel graph coarsening algorithm to enhance the impact of updates and minimize the work for embedding. It also incorporates a decomposition schema that enables any arbitrarily large graph to be embedded with a single GPU. As a result, GOSH sets a new state-of-the-art in link prediction both in accuracy and speed, and delivers high-quality embeddings for node classification at a fraction of the time compared to the state-of-the-art. For instance, it can embed a graph with over 65 million vertices and 1.8 billion edges in less than 30 minutes on a single GPU.

[1]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[2]  Kamer Kaya,et al.  Understanding Coarsening for Embedding Large-Scale Graphs , 2020, 2020 IEEE International Conference on Big Data (Big Data).

[3]  Emmanuel Müller,et al.  VERSE: Versatile Graph Embeddings from Similarity Measures , 2018, WWW.

[4]  Xiaokui Xiao,et al.  Homogeneous network embedding for massive graphs via reweighted personalized PageRank , 2019, Proc. VLDB Endow..

[5]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[6]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[7]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[8]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[9]  Srinivasan Parthasarathy,et al.  MILE: A Multi-Level Framework for Scalable Graph Embedding , 2018, ICWSM.

[10]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[11]  Sebastiano Vigna,et al.  The Graph Structure in the Web - Analyzed on Different Aggregation Levels , 2015, J. Web Sci..

[12]  Sivasankaran Rajamanickam,et al.  Performance-Portable Graph Coarsening for Efficient Multilevel Graph Analysis , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[13]  Jian Tang,et al.  GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding , 2019, WWW.

[14]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[15]  Erik Ordentlich,et al.  Network-Efficient Distributed Word2vec Training System for Large Vocabularies , 2016, CIKM.

[16]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[17]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[18]  Steven Skiena,et al.  HARP: Hierarchical Representation Learning for Networks , 2017, AAAI.

[19]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[20]  G. Karypis,et al.  DGL-KE: Training Knowledge Graph Embeddings at Scale , 2020, SIGIR.

[21]  Yan Wang,et al.  ProNE: Fast and Scalable Network Representation Learning , 2019, IJCAI.

[22]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[23]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[24]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[25]  Alexander Peysakhovich,et al.  PyTorch-BigGraph: A Large-scale Graph Embedding System , 2019, SysML.

[26]  Noam Shazeer,et al.  Swivel: Improving Embeddings by Noticing What's Missing , 2016, ArXiv.

[27]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[28]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[29]  Dongxu Yang,et al.  EDGES: An Efficient Distributed Graph Embedding System on GPU Clusters , 2021, IEEE Transactions on Parallel and Distributed Systems.

[30]  Jian Li,et al.  NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization , 2019, WWW.