LEOPARD: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs

This paper introduces a dynamic graph partitioning algorithm, designed for large, constantly changing graphs. We propose a partitioning framework that adjusts on the fly as the graph structure changes. We also introduce a replication algorithm that is tightly integrated with the partitioning algorithm, which further reduces the number of edges cut by the partitioning algorithm. Even though the proposed approach is handicapped by only taking into consideration local parts of the graph when reassigning vertices, extensive evaluation shows that the proposed approach maintains a quality partitioning over time, which is comparable at any point in time to performing a full partitioning from scratch using a state-the-art static graph partitioning algorithm such as METIS. Furthermore, when vertex replication is turned on, edge-cut can improve by an order of magnitude.

[1]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[2]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[3]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[4]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[5]  Ning Xu,et al.  LogGP: A Log-based Dynamic Graph Partitioning Method , 2014, Proc. VLDB Endow..

[6]  David S. Johnson,et al.  Some simplified NP-complete problems , 1974, STOC '74.

[7]  Vipin Kumar,et al.  Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes , 1997, J. Parallel Distributed Comput..

[8]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[9]  Sameh Elnikety,et al.  Horton: Online Query Execution Engine for Large Distributed Graphs , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[10]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[11]  Abdul Quamar,et al.  SWORD: scalable workload-aware data placement for transactional workloads , 2013, EDBT '13.

[12]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[13]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[14]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[15]  Martin G. Everett,et al.  Parallel Dynamic Graph Partitioning for Adaptive Unstructured Meshes , 1997, J. Parallel Distributed Comput..

[16]  Joel Nishimura,et al.  Restreaming graph partitioning: simple versatile algorithms for advanced balancing , 2013, KDD.

[17]  Amol Deshpande,et al.  Managing large dynamic graphs efficiently , 2012, SIGMOD Conference.

[18]  S.T. Barnard,et al.  PMRSB: Parallel Multilevel Recursive Spectral Bisection , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[19]  Konstantin Andreev,et al.  Balanced Graph Partitioning , 2004, SPAA '04.

[20]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[21]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[22]  François Pellegrini,et al.  PT-Scotch: A tool for efficient parallel graph ordering , 2008, Parallel Comput..

[23]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Charalampos E. Tsourakakis,et al.  FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.

[25]  Félix Cuadrado,et al.  Adaptive Partitioning for Large-Scale Dynamic Graphs , 2013, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[26]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2010, SIGCOMM '10.

[27]  R. M. Mattheyses,et al.  A Linear-Time Heuristic for Improving Network Partitions , 1982, 19th Design Automation Conference.

[28]  Jack Dongarra,et al.  Sourcebook of parallel computing , 2003 .

[29]  Lars Backstrom,et al.  Balanced label propagation for partitioning massive graphs , 2013, WSDM.

[30]  Joseph Naor,et al.  Fast approximate graph partitioning algorithms , 1997, SODA '97.

[31]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[32]  Jean Roman,et al.  SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs , 1996, HPCN Europe.

[33]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[34]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[35]  Vipin Kumar,et al.  Graph partitioning for high-performance scientific simulations , 2003 .

[36]  Sergei Vassilvitskii,et al.  Sharding social networks , 2013, WSDM.

[37]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[38]  N. Metropolis,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2017 .

[39]  Jimeng Sun,et al.  GBASE: a scalable and general graph management system , 2011, KDD.

[40]  Vipin Kumar,et al.  Parallel static and dynamic multi‐constraint graph partitioning , 2002, Concurr. Comput. Pract. Exp..

[41]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Lu Wang,et al.  How to partition a billion-node graph , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[43]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..