PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs

Natural graphs with skewed distributions raise unique challenges to distributed graph computation and partitioning. Existing graph-parallel systems usually use a “one-size-fits-all” design that uniformly processes all vertices, which either suffer from notable load imbalance and high contention for high-degree vertices (e.g., Pregel and GraphLab) or incur high communication cost and memory consumption even for low-degree vertices (e.g., PowerGraph and GraphX). In this article, we argue that skewed distributions in natural graphs also necessitate differentiated processing on high-degree and low-degree vertices. We then introduce PowerLyra, a new distributed graph processing system that embraces the best of both worlds of existing graph-parallel systems. Specifically, PowerLyra uses centralized computation for low-degree vertices to avoid frequent communications and distributes the computation for high-degree vertices to balance workloads. PowerLyra further provides an efficient hybrid graph partitioning algorithm (i.e., hybrid-cut) that combines edge-cut (for low-degree vertices) and vertex-cut (for high-degree vertices) with heuristics. To improve cache locality of inter-node graph accesses, PowerLyra further provides a locality-conscious data layout optimization. PowerLyra is implemented based on the latest GraphLab and can seamlessly support various graph algorithms running in both synchronous and asynchronous execution modes. A detailed evaluation on three clusters using various graph-analytics and MLDM (Machine Learning and Data Mining) applications shows that PowerLyra outperforms PowerGraph by up to 5.53X (from 1.24X) and 3.26X (from 1.49X) for real-world and synthetic graphs, respectively, and is much faster than other systems like GraphX and Giraph, yet with much less memory consumption. A porting of hybrid-cut to GraphX further confirms the efficiency and generality of PowerLyra.

[1]  Indranil Gupta,et al.  LFGraph: simple and fast distributed graph analytics , 2013, TRIOS@SOSP.

[2]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[3]  Haibo Chen,et al.  Bipartite-Oriented Distributed Graph Partitioning for Big Learning , 2014, Journal of Computer Science and Technology.

[4]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[5]  Zhengping Qian,et al.  TimeStream: reliable stream computation in the cloud , 2013, EuroSys '13.

[6]  Willy Zwaenepoel,et al.  Chaos: scale-out graph processing from secondary storage , 2015, SOSP.

[7]  Mohan Kumar,et al.  Mosaic: Processing a Trillion-Edge Graph on a Single Machine , 2017, EuroSys.

[8]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[9]  Hai Jin,et al.  HotGraph: Efficient Asynchronous Processing for Real-World Graphs , 2017, IEEE Transactions on Computers.

[10]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[11]  U Kang,et al.  Fast and Scalable Distributed Loopy Belief Propagation on Real-World Graphs , 2018, WSDM.

[12]  Zhaohui Zheng,et al.  Stochastic gradient boosted distributed decision trees , 2009, CIKM.

[13]  Haibo Chen,et al.  Replication-Based Fault-Tolerance for Large-Scale Graph Processing , 2018, IEEE Transactions on Parallel and Distributed Systems.

[14]  Haibo Chen,et al.  SYNC or ASYNC: time to fuse for distributed graph-parallel computation , 2015, PPoPP.

[15]  Ben Y. Zhao,et al.  Sharing graphs using differentially private graph models , 2011, IMC '11.

[16]  Luke M. Leslie,et al.  An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing , 2017, Proc. VLDB Endow..

[17]  Enhong Chen,et al.  Kineograph: taking the pulse of a fast-changing and connected world , 2012, EuroSys '12.

[18]  Monica S. Lam,et al.  Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis , 2013, Proc. VLDB Endow..

[19]  Torsten Hoefler,et al.  Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages , 2015, HPDC.

[20]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[21]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[22]  Reena Panda,et al.  Data partitioning strategies for graph workloads on heterogeneous clusters , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX Annual Technical Conference.

[24]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[25]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[26]  Bora Uçar,et al.  On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe , 2010, SIAM J. Sci. Comput..

[27]  Rong Chen,et al.  PowerLyra: differentiated graph computation and partitioning on skewed graphs , 2015, EuroSys.

[28]  Lada A. Adamic,et al.  Zipf's law and the Internet , 2002, Glottometrics.

[29]  Shirish Tatikonda,et al.  From "Think Like a Vertex" to "Think Like a Graph" , 2013, Proc. VLDB Endow..

[30]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[31]  Theodore L. Willke,et al.  GraphBuilder: scalable graph ETL framework , 2013, GRADES.

[32]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[33]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[34]  Hao Wang,et al.  PaPar: A Parallel Data Partitioning Framework for Big Data Applications , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[35]  Sebastiano Vigna,et al.  UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[36]  Haibo Chen,et al.  Fast and Concurrent RDF Queries with RDMA-Based Distributed Graph Exploration , 2016, OSDI.

[37]  Ben Y. Zhao,et al.  On the Embeddability of Random Walk Distances , 2013, Proc. VLDB Endow..

[38]  TianYuanyuan,et al.  From "think like a vertex" to "think like a graph" , 2013, VLDB 2013.

[39]  Haibo Chen,et al.  Fast and Concurrent RDF Queries using RDMA-assisted GPU Graph Exploration , 2018, USENIX Annual Technical Conference.

[40]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[41]  Willy Zwaenepoel,et al.  Everything you always wanted to know about multicore graph processing but were afraid to ask , 2017, USENIX Annual Technical Conference.

[42]  Kang G. Shin,et al.  Hieroglyph: Locally-Sufficient Graph Processing via Compute-Sync-Merge , 2017, SIGMETRICS 2017.

[43]  David R. O'Hallaron,et al.  Distributed Parallel Inference on Large Factor Graphs , 2009, UAI.

[44]  Steven Hand,et al.  Musketeer: all for one, one for all in data processing systems , 2015, EuroSys.

[45]  Margo I. Seltzer,et al.  A Scalable Distributed Graph Partitioner , 2015, Proc. VLDB Endow..

[46]  Kamesh Munagala,et al.  I/O-complexity of graph algorithms , 1999, SODA '99.

[47]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[48]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[49]  Panos Kalnis,et al.  Mizan: a system for dynamic load balancing in large-scale graph processing , 2013, EuroSys '13.

[50]  Domonkos Tikk,et al.  Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[51]  Vipin Kumar,et al.  Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning (Distinguished Paper) , 2000, Euro-Par.

[52]  Wencong Xiao,et al.  GraM: scaling graph computation to the trillions , 2015, SoCC.

[53]  FaloutsosMichalis,et al.  On power-law relationships of the Internet topology , 1999 .

[54]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[55]  Weimin Zheng,et al.  Exploring the Hidden Dimension in Graph Processing , 2016, OSDI.

[56]  Edmond Chow,et al.  A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[57]  Rishan Chen,et al.  Improving large graph processing on partitioned graphs in the cloud , 2012, SoCC '12.

[58]  Wilfred Ng,et al.  Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs , 2014, Proc. VLDB Endow..

[59]  Roberto J. Bayardo,et al.  PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce , 2009, Proc. VLDB Endow..

[60]  Jun Li,et al.  Sandpiper: Scaling probabilistic inferencing to large scale graphical models , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[61]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[62]  Christoforos E. Kozyrakis,et al.  GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[63]  Ümit V. Çatalyürek,et al.  Decomposing Irregularly Sparse Matrices for Parallel Matrix-Vector Multiplication , 1996, IRREGULAR.

[64]  Zhihua Zhang,et al.  Distributed Power-law Graph Computing: Theoretical and Empirical Analysis , 2014, NIPS.

[65]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[66]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[67]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[68]  George Karypis,et al.  Multilevel algorithms for partitioning power-law graphs , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[69]  Haibo Chen,et al.  NUMA-aware graph-structured analytics , 2015, PPoPP.

[70]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[71]  Wenguang Chen,et al.  Chronos: a graph engine for temporal graph analysis , 2014, EuroSys '14.

[72]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[73]  Ana Paula Appel,et al.  HADI: Mining Radii of Large Graphs , 2011, TKDD.

[74]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[75]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[76]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[77]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[78]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[79]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[80]  Weimin Zheng,et al.  Measuring and Optimizing Distributed Array Programs , 2016, Proc. VLDB Endow..

[81]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[82]  Binyu Zang,et al.  Computation and communication efficient graph processing with distributed immutable view , 2014, HPDC '14.

[83]  Huaimin Wang,et al.  JointCloud: A Cross-Cloud Cooperation Architecture for Integrated Internet Service Customization , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[84]  Guy E. Blelloch,et al.  Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+ , 2015, 2015 Data Compression Conference.

[85]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[86]  Peng Wang,et al.  Replication-Based Fault-Tolerance for Large-Scale Graph Processing , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[87]  Heng Zhang,et al.  Efficient and Available In-Memory KV-Store with Hybrid Erasure Coding and Replication , 2016, FAST.

[88]  Ben Y. Zhao,et al.  Measurement-calibrated graph models for social network experiments , 2010, WWW '10.

[89]  Ying Liu,et al.  Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation , 2018, PPoPP.

[90]  Arthur Gretton,et al.  Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees , 2011, AISTATS.

[91]  Yafei Dai,et al.  Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication , 2017, USENIX Annual Technical Conference.

[92]  Charalampos E. Tsourakakis,et al.  FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.

[93]  Vipin Kumar,et al.  Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs , 1999, SIAM Rev..

[94]  Wei Li,et al.  Tux2: Distributed Graph Computation for Machine Learning , 2017, NSDI.

[95]  Charles E. Leiserson,et al.  Executing dynamic data-graph computations deterministically using chromatic scheduling , 2014, SPAA.

[96]  Haibo Chen,et al.  Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data , 2017, SOSP.