ADWISE: Adaptive Window-Based Streaming Edge Partitioning for High-Speed Graph Processing

In recent years, the graph partitioning problem gained importance as a mandatory preprocessing step for distributed graph processing on very large graphs. Existing graph partitioning algorithms minimize partitioning latency by assigning individual graph edges to partitions in a streaming manner - at the cost of reduced partitioning quality. However, we argue that the mere minimization of partitioning latency is not the optimal design choice in terms of minimizing total graph analysis latency, i.e., the sum of partitioning and processing latency. Instead, for complex and long-running graph processing algorithms that run on very large graphs, it is beneficial to invest more time into graph partitioning to reach a higher partitioning quality - which drastically reduces graph processing latency. In this paper, we propose ADWISE, a novel window-based streaming partitioning algorithm that increases the partitioning quality by always choosing the best edge from a set of edges for assignment to a partition. In doing so, ADWISE controls the partitioning latency by adapting the window size dynamically at run-time. Our evaluations show that ADWISE can reach the sweet spot between graph partitioning latency and graph processing latency, reducing the total latency of partitioning plus processing by up to 23-47 percent compared to the state-of-the-art.

[1]  Amir H. Payberah,et al.  Boosting Vertex-Cut Partitioning for Streaming Graphs , 2016, 2016 IEEE International Congress on Big Data (BigData Congress).

[2]  Fabio Petroni,et al.  HDRF: Stream-Based Partitioning for Power-Law Graphs , 2015, CIKM.

[3]  Jinyan Wang,et al.  GraphA: Adaptive Partitioning for Natural Graphs , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[4]  Sameh Elnikety,et al.  Systems for Big-Graphs , 2014, Proc. VLDB Endow..

[5]  Bingsheng He,et al.  On Achieving Efficient Data Transfer for Graph Processing in Geo-Distributed Datacenters , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[6]  Theodore L. Willke,et al.  GraphBuilder: scalable graph ETL framework , 2013, GRADES.

[7]  Yogesh L. Simmhan,et al.  GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics , 2013, Euro-Par.

[8]  Yi Lu,et al.  Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation , 2014, Proc. VLDB Endow..

[9]  Alexandros Labrinidis,et al.  Planar: Parallel lightweight architecture-aware adaptive graph repartitioning , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[10]  Joel Nishimura,et al.  Restreaming graph partitioning: simple versatile algorithms for advanced balancing , 2013, KDD.

[11]  Elisa Bertino,et al.  Privacy Preserving User-Based Recommender System , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[12]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[13]  Claudio Martella,et al.  Spinner: Scalable Graph Partitioning in the Cloud , 2014, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[14]  Daniel J. Abadi,et al.  LEOPARD: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs , 2016, Proc. VLDB Endow..

[15]  Zhihua Zhang,et al.  Distributed Power-law Graph Computing: Theoretical and Empirical Analysis , 2014, NIPS.

[16]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[17]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[18]  Kurt Rothermel,et al.  GrapH: Heterogeneity-Aware Graph Computation with Adaptive Partitioning , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[19]  James R. Lee,et al.  Improved approximation algorithms for minimum-weight vertex separators , 2005, STOC '05.

[20]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[21]  Jeffrey Xu Yu,et al.  Catch the Wind: Graph workload balancing on cloud , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[22]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[23]  Amir H. Payberah,et al.  Distributed Vertex-Cut Partitioning , 2014, DAIS.

[24]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[25]  Zhenguo Li,et al.  Graph Edge Partitioning via Neighborhood Heuristic , 2017, KDD.

[26]  Kun-Lung Wu,et al.  Streaming Algorithms for k-core Decomposition , 2013, Proc. VLDB Endow..

[27]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[28]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[29]  Charalampos E. Tsourakakis,et al.  FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.

[30]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[31]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[32]  Luke M. Leslie,et al.  An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing , 2017, Proc. VLDB Endow..

[33]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[34]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[35]  William Song,et al.  Streaming graph challenge: Stochastic block partition , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).