Understanding Coarsening for Embedding Large-Scale Graphs

A significant portion of the data today, e.g, social networks, web connections, etc., can be modeled by graphs. A proper analysis of graphs with Machine Learning (ML) algorithms has the potential to yield far-reaching insights into many areas of research and industry. However, the irregular structure of graph data constitutes an obstacle for running ML tasks on graphs such as link prediction, node classification, and anomaly detection. Graph embedding is a compute-intensive process of representing graphs as a set of vectors in a d-dimensional space, which in turn makes it amenable to ML tasks. Many approaches have been proposed in the literature to improve the performance of graph embedding, e.g., using distributed algorithms, accelerators, and pre-processing techniques. Graph coarsening, which can be considered a pre-processing step, is a structural approximation of a given, large graph with a smaller one. As the literature suggests, the cost of embedding significantly decreases when coarsening is employed. In this work, we thoroughly analyze the impact of the coarsening quality on the embedding performance both in terms of speed and accuracy. Our experiments with a state-of-the-art, fast graph embedding tool show that there is an interplay between the coarsening decisions taken and the embedding quality.

[1]  Jean Roman,et al.  SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs , 1996, HPCN Europe.

[2]  Antonino Tumeo,et al.  Scalable static and dynamic community detection using Grappolo , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[3]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[4]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[5]  Fabrizio Petrini,et al.  Prune the Unnecessary: Parallel Pull-Push Louvain Algorithms with Automatic Edge Pruning , 2020, ICPP.

[6]  Kamer Kaya,et al.  GOSH: Embedding Big Graphs on Small Hardware , 2020, ICPP.

[7]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[8]  Steven Skiena,et al.  HARP: Hierarchical Representation Learning for Networks , 2017, AAAI.

[9]  Srinivasan Parthasarathy,et al.  MILE: A Multi-Level Framework for Scalable Graph Embedding , 2018, ICWSM.

[10]  Mehmet Deveci,et al.  Multithreaded Clustering for Multi-level Hypergraph Partitioning , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[11]  Jian Tang,et al.  GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding , 2019, WWW.

[12]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[13]  Alexander Peysakhovich,et al.  PyTorch-BigGraph: A Large-scale Graph Embedding System , 2019, SysML.

[14]  Sebastiano Vigna,et al.  The Graph Structure in the Web - Analyzed on Different Aggregation Levels , 2015, J. Web Sci..

[15]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[16]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[17]  Charu C. Aggarwal,et al.  An embedding approach to anomaly detection , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[18]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[19]  Emmanuel Müller,et al.  VERSE: Versatile Graph Embeddings from Similarity Measures , 2018, WWW.

[20]  Mehmet Deveci,et al.  Hypergraph partitioning for multiple communication cost metrics: Model and methods , 2015, J. Parallel Distributed Comput..

[21]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[22]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[23]  Yifan Hu,et al.  Efficient, High-Quality Force-Directed Graph Drawing , 2006 .

[24]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[25]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[26]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  KarypisGeorge,et al.  Multilevelk-way Partitioning Scheme for Irregular Graphs , 1998 .

[28]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..