IOGP: An Incremental Online Graph Partitioning Algorithm for Distributed Graph Databases

Graphs have become increasingly important in many applications and domains such as querying relationships in social networks or managing rich metadata generated in scientific computing. Many of these use cases require high-performance distributed graph databases for serving continuous updates from clients and, at the same time, answering complex queries regarding the current graph. These operations in graph databases, also referred to as online transaction processing (OLTP) operations, have specific design and implementation requirements for graph partitioning algorithms. In this research, we argue it is necessary to consider the connectivity and the vertex degree changes during graph partitioning. Based on this idea, we designed an Incremental Online Graph Partitioning (IOGP) algorithm that responds accordingly to the incremental changes of vertex degree. IOGP helps achieve better locality, generate balanced partitions, and increase the parallelism for accessing high-degree vertices of the graph. Over both real-world and synthetic graphs, IOGP demonstrates as much as 2x better query performance with a less than 10% overhead when compared against state-of-the-art graph partitioning algorithms.

[1]  Brian W. Barrett,et al.  Introducing the Graph 500 , 2010 .

[2]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[3]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[4]  Gary T. Leavens,et al.  Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity , 2012, SPLASH 2012.

[5]  Luciano Rossoni,et al.  Models and methods in social network analysis , 2006 .

[6]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[7]  François Pellegrini,et al.  PT-Scotch: A tool for efficient parallel graph ordering , 2008, Parallel Comput..

[8]  H. Howie Huang,et al.  G-Store: High-Performance Graph Store for Trillion-Edge Processing , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Wei Zhang,et al.  GraphMeta: A Graph-Based Engine for Managing Large-Scale HPC Rich Metadata , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[10]  Robert B. Ross,et al.  GraphTrek: Asynchronous Graph Traversal for Property Graph-Based Metadata Management , 2015, 2015 IEEE International Conference on Cluster Computing.

[11]  Daniel J. Abadi,et al.  LEOPARD: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs , 2016, Proc. VLDB Endow..

[12]  Félix Cuadrado,et al.  Adaptive Partitioning for Large-Scale Dynamic Graphs , 2013, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[13]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[14]  David S. Johnson,et al.  Some simplified NP-complete problems , 1974, STOC '74.

[15]  Joel Nishimura,et al.  Restreaming graph partitioning: simple versatile algorithms for advanced balancing , 2013, KDD.

[16]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[17]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[18]  David A. Bader,et al.  Tracking Structure of Streaming Social Networks , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[19]  Jean Roman,et al.  SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs , 1996, HPCN Europe.

[20]  Shirish Tatikonda,et al.  From "Think Like a Vertex" to "Think Like a Graph" , 2013, Proc. VLDB Endow..

[21]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[22]  Bruce Hendrickson,et al.  The Chaco user`s guide. Version 1.0 , 1993 .

[23]  S.T. Barnard,et al.  PMRSB: Parallel Multilevel Recursive Spectral Bisection , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[24]  Ling Liu,et al.  Analyzing Enterprise Storage Workloads With Graph Modeling and Clustering , 2016, IEEE Journal on Selected Areas in Communications.

[25]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[26]  Robert B. Ross,et al.  Using Property Graphs for Rich Metadata Management in HPC Systems , 2014, 2014 9th Parallel Data Storage Workshop.

[27]  Vipin Kumar,et al.  Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes , 1997, J. Parallel Distributed Comput..

[28]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[29]  Lars Backstrom,et al.  Balanced label propagation for partitioning massive graphs , 2013, WSDM.

[30]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[31]  Jim Webber,et al.  A programmatic introduction to Neo4j , 2018, SPLASH '12.

[32]  Charalampos E. Tsourakakis,et al.  FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.