GrapH: Traffic-Aware Graph Processing

Distributed graph processing systems such as Pregel, PowerGraph, or GraphX gained popularity due to their superior performance of data analytics on graph-structured data. These systems employ partitioning algorithms to parallelize graph analytics while minimizing inter-partition communication. Recent partitioning algorithms, however, unrealistically assume a uniform and constant amount of data exchanged between graph vertices (i.e., uniform vertex traffic) and homogeneous network costs between workers hosting the graph partitions. This leads to suboptimal partitioning decisions and inefficient graph processing. To this end, we developed GrapH, the first graph processing system using vertex-cut graph partitioning that considers both, diverse vertex traffic and heterogeneous network costs. The main idea is to avoid frequent communication over expensive network links using an adaptive edge migration strategy. Our evaluations show an improvement of 10 percent in graph processing latency and 60 percent in communication costs compared to state-of-the-art partitioning approaches.

[1]  Lei Chen,et al.  Efficient distributed subgraph similarity matching , 2015, The VLDB Journal.

[2]  Thomas Stützle,et al.  Iterated local search for the quadratic assignment problem , 2006, Eur. J. Oper. Res..

[3]  Hari Balakrishnan,et al.  Choreo: network-aware task placement for cloud applications , 2013, Internet Measurement Conference.

[4]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[5]  Frank Dürr,et al.  Optimized location update protocols for secure and efficient position sharing , 2015, 2015 International Conference and Workshops on Networked Systems (NetSys).

[6]  Hitesh Ballani,et al.  Towards predictable datacenter networks , 2011, SIGCOMM 2011.

[7]  Paramvir Bahl,et al.  Low Latency Geo-distributed Data Analytics , 2015, SIGCOMM.

[8]  James R. Lee,et al.  Improved approximation algorithms for minimum-weight vertex separators , 2005, STOC '05.

[9]  Anand Sivasubramaniam,et al.  Towards a Leaner Geo-distributed Cloud Infrastructure , 2014, HotCloud.

[10]  Félix Cuadrado,et al.  Adaptive Partitioning for Large-Scale Dynamic Graphs , 2013, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[11]  Bo Zong,et al.  Towards effective partition management for large graphs , 2012, SIGMOD Conference.

[12]  Toyotaro Suzumura,et al.  Towards billion-scale social simulations , 2014, Proceedings of the Winter Simulation Conference 2014.

[13]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX Annual Technical Conference.

[14]  Alexandros Labrinidis,et al.  Argo: Architecture-aware graph partitioning , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[15]  Kurt Rothermel,et al.  GrapH: Heterogeneity-Aware Graph Computation with Adaptive Partitioning , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[16]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[17]  Samuel Kounev,et al.  Self‐adaptive workload classification and forecasting for proactive resource provisioning , 2014, Concurr. Comput. Pract. Exp..

[18]  K.H.W.J. ten Tusscher,et al.  Comments on 'A model for human ventricular tissue' : reply , 2005 .

[19]  M. Tamer Özsu,et al.  An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..

[20]  Umberto Spagnolini,et al.  Cooperative Bayesian Estimation of Vehicular Traffic in Large-Scale Networks , 2014, IEEE Transactions on Intelligent Transportation Systems.

[21]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[22]  Panos Kalnis,et al.  Mizan: a system for dynamic load balancing in large-scale graph processing , 2013, EuroSys '13.

[23]  Zhe Zhang,et al.  VDN: Virtual machine image distribution network for cloud data centers , 2012, 2012 Proceedings IEEE INFOCOM.

[24]  Jeffrey Xu Yu,et al.  Catch the Wind: Graph workload balancing on cloud , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[25]  Tianyu Wo,et al.  Capturing Topology in Graph Pattern Matching , 2011, Proc. VLDB Endow..

[26]  Carlo Curino,et al.  WANalytics: Analytics for a Geo-Distributed Data-Intensive World , 2015, CIDR.

[27]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[28]  Rishan Chen,et al.  Improving large graph processing on partitioned graphs in the cloud , 2012, SoCC '12.

[29]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[30]  Zi Huang,et al.  Heterogeneous Environment Aware Streaming Graph Partitioning , 2015, IEEE Transactions on Knowledge and Data Engineering.

[31]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[32]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[33]  Patrick Th. Eugster,et al.  From the Cloud to the Atmosphere: Running MapReduce across Data Centers , 2014, IEEE Transactions on Computers.

[34]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[35]  Zhihua Zhang,et al.  Distributed Power-law Graph Computing: Theoretical and Empirical Analysis , 2014, NIPS.

[36]  Wencong Xiao,et al.  GraM: scaling graph computation to the trillions , 2015, SoCC.

[37]  Fabio Petroni,et al.  HDRF: Stream-Based Partitioning for Power-Law Graphs , 2015, CIKM.

[38]  P. P. Chaudhuri,et al.  A Survey on Cellular Automata ∗ , 2003 .

[39]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[40]  Hua Chen,et al.  Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.

[41]  D. Janaki Ram,et al.  GraphIVE: Heterogeneity-Aware Adaptive Graph Partitioning in GraphLab , 2014, 2014 43rd International Conference on Parallel Processing Workshops.