Catch the Wind: Graph workload balancing on cloud

Graph partitioning is a key issue in graph database processing systems for achieving high efficiency on Cloud. However, the balanced graph partitioning itself is difficult because it is known to be NP-complete. In addition a static graph partitioning cannot keep all graph algorithms efficient for a long time in parallel on Cloud because the workload balancing in different iterations for different graph algorithms are all possible different. In this paper, we investigate graph behaviors by exploring the working window (we call it wind) changes, where a working window is a set of active vertices that a graph algorithm really needs to access in parallel computing. We investigated nine classic graph algorithms using real datasets, and propose simple yet effective policies that can achieve both high graph workload balancing and efficient partition on Cloud.

[1]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[2]  Haixun Wang,et al.  Managing and mining large graphs: systems and implementations , 2012, SIGMOD Conference.

[3]  Michael Luby,et al.  A simple parallel algorithm for the maximal independent set problem , 1985, STOC '85.

[4]  Thomas E. Anderson,et al.  High-speed switch scheduling for local-area networks , 1993, TOCS.

[5]  David Peleg,et al.  Distributed Computing: A Locality-Sensitive Approach , 1987 .

[6]  George Karypis,et al.  Partitioning and Load Balancing for Emerging Parallel Applications and Architectures , 2006, Parallel Processing for Scientific Computing.

[7]  Bo Zong,et al.  Towards effective partition management for large graphs , 2012, SIGMOD Conference.

[8]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[9]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[10]  Robert van Engelen,et al.  Graph Partitioning for High Performance Scienti c Simulations , 2000 .

[11]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[12]  Divyakant Agrawal,et al.  Zephyr: live migration in shared nothing databases for elastic cloud platforms , 2011, SIGMOD '11.

[13]  Amol Deshpande,et al.  Managing large dynamic graphs efficiently , 2012, SIGMOD Conference.

[14]  Rayleigh The Problem of the Random Walk , 1905, Nature.

[15]  Konstantin Andreev,et al.  Balanced Graph Partitioning , 2004, SPAA '04.

[16]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[17]  Carlo Curino,et al.  Lookup Tables: Fine-Grained Partitioning for Distributed Databases , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[18]  Nathan Linial,et al.  Locality in Distributed Graph Algorithms , 1992, SIAM J. Comput..

[19]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[20]  David S. Johnson,et al.  Some simplified NP-complete problems , 1974, STOC '74.

[21]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[22]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[23]  Pierre A. Humblet,et al.  A Distributed Algorithm for Minimum-Weight Spanning Trees , 1983, TOPL.

[24]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2010, SIGCOMM '10.

[25]  Andrew V. Goldberg,et al.  Shortest paths algorithms: Theory and experimental evaluation , 1994, SODA '94.