AKIN: A Streaming Graph Partitioning Algorithm for Distributed Graph Storage Systems

Many graph-related applications face the challenge of managing excessive and ever-growing graph data in a distributed environment. Therefore, it is necessary to consider a graph partitioning algorithm to distribute graph data onto multiple machines as the data comes in. Balancing data distribution and minimizing edge-cut ratio are two basic pursuits of the graph partitioning problem. While achieving balanced partitions for streaming graphs is easy, existing graph partitioning algorithms either fail to work on streaming workloads, or leave edge-cut ratio to be further improved. Our research aims to provide a better solution that fits the need of streaming graph partitioning in a distributed system, which further reduces the edge-cut ratio while maintaining rough balance among all partitions. We exploit the similarity measure on the degree of vertices to gather structuralrelated vertices in the same partition as much as possible, this reduces the edge-cut ratio even further as compared to the state-of-the-art streaming graph partitioning algorithm - FENNEL. Our evaluation shows that our streaming graph partitioning algorithm is able to achieve better partitioning quality in terms of edge-cut ratio (up to 20% reduction as compared to FENNEL) while maintaining decent balance between all partitions, and such improvement applies to various real-life graphs.

[1]  Amir H. Payberah,et al.  Boosting Vertex-Cut Partitioning for Streaming Graphs , 2016, 2016 IEEE International Congress on Big Data (BigData Congress).

[2]  Michalis Vazirgiannis,et al.  Clustering and Community Detection in Directed Networks: A Survey , 2013, ArXiv.

[3]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[4]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[5]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[6]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[7]  K. Selçuk Candan,et al.  SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices , 2012, Data Knowl. Eng..

[8]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[9]  Wei Zhang,et al.  GraphMeta: A Graph-Based Engine for Managing Large-Scale HPC Rich Metadata , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[10]  Curt Jones,et al.  Finding Good Approximate Vertex and Edge Partitions is NP-Hard , 1992, Inf. Process. Lett..

[11]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[12]  Ronald Rousseau,et al.  Similarity measures in scientometric research: The Jaccard index versus Salton's cosine formula , 1989, Inf. Process. Manag..

[13]  Sudeep Sarkar,et al.  Supervised Learning of Large Perceptual Organization: Graph Spectral Partitioning and Learning Automata , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[15]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[16]  Charalampos E. Tsourakakis,et al.  FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.

[17]  Vipin Kumar,et al.  Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs , 1999, SIAM Rev..

[18]  Peter Sanders,et al.  Recent Advances in Graph Partitioning , 2013, Algorithm Engineering.

[19]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[20]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[21]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[22]  David S. Johnson,et al.  Some simplified polynomial complete problems , 1974 .

[23]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[24]  Robert B. Ross,et al.  Using Property Graphs for Rich Metadata Management in HPC Systems , 2014, 2014 9th Parallel Data Storage Workshop.