A New Frontier for Pull-Based Graph Processing

The trade-off between pull-based and push-based graph processing engines is well-understood. On one hand, pull-based engines can achieve higher throughput because their workloads are read-dominant, rather than write-dominant, and can proceed without synchronization between threads. On the other hand, push-based engines are much better able to take advantage of the frontier optimization, which leverages the fact that often only a small subset of the graph needs to be accessed to complete an iteration of a graph processing application. Hybrid engines attempt to overcome this trade-off by dynamically switching between push and pull, but there are two key disadvantages with this approach. First, applications must be implemented twice (once for push and once for pull), and second, processing throughput is reduced for iterations that run with push. We propose a radically different solution: rebuild the frontier optimization entirely such that it is well-suited for a pull-based engine. In doing so, we remove the only advantage that a push-based engine had over a pull-based engine, making it possible to eliminate the push-based engine entirely. We introduce Wedge, a pull-only graph processing framework that transforms the traditional source-oriented vertex-based frontier into a pull-friendly format called the Wedge Frontier. The transformation itself is expensive even when parallelized, so we introduce two key optimizations to make it practical. First, we perform the transformation only when the resulting Wedge Frontier is sufficiently sparse. Second, we coarsen the granularity of the representation of elements in the Wedge Frontier. These optimizations respectively improve Wedge's performance by up to 5x and 2x, enabling it to outperform Grazelle, Ligra, and GraphMat respectively by up to 2.8x, 4.9x, and 185.5x.

[1]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[2]  GraphIt - A High-Performance DSL for Graph Analytics , 2018, ArXiv.

[3]  Christoforos E. Kozyrakis,et al.  Making pull-based graph processing performant , 2018, PPoPP.

[4]  Dimitrios S. Nikolopoulos,et al.  Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning , 2017, 2017 46th International Conference on Parallel Processing (ICPP).

[5]  Yafei Dai,et al.  Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication , 2017, USENIX Annual Technical Conference.

[6]  Torsten Hoefler,et al.  To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations , 2017, HPDC.

[7]  Mario Szegedy,et al.  A Simple Yet Effective Balanced Edge Partition Model for Parallel Computing , 2017, SIGMETRICS.

[8]  Matei Zaharia,et al.  Making caches work for graph analytics , 2016, 2017 IEEE International Conference on Big Data (Big Data).

[9]  Weimin Zheng,et al.  Exploring the Hidden Dimension in Graph Processing , 2016, OSDI.

[10]  Margaret Martonosi,et al.  Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  David A. Patterson,et al.  Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server , 2015, 2015 IEEE International Symposium on Workload Characterization.

[12]  Willy Zwaenepoel,et al.  Chaos: scale-out graph processing from secondary storage , 2015, SOSP.

[13]  Wencong Xiao,et al.  GraM: scaling graph computation to the trillions , 2015, SoCC.

[14]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[15]  Gagan Agrawal,et al.  Efficient and Simplified Parallel Graph Processing over CPU and MIC , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[16]  Guy E. Blelloch,et al.  Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+ , 2015, 2015 Data Compression Conference.

[17]  Pradeep Dubey,et al.  GraphMat: High performance graph analytics made productive , 2015, Proc. VLDB Endow..

[18]  Haibo Chen,et al.  NUMA-aware graph-structured analytics , 2015, PPoPP.

[19]  Haibo Chen,et al.  SYNC or ASYNC: time to fuse for distributed graph-parallel computation , 2015, PPoPP.

[20]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[21]  Alexander S. Szalay,et al.  FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs , 2014, FAST.

[22]  David A. Bader,et al.  Scalable and High Performance Betweenness Centrality on the GPU , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[24]  M. Tamer Özsu,et al.  An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..

[25]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[26]  Alberto Montresor,et al.  An evaluation study of BigData frameworks for graph processing , 2013, 2013 IEEE International Conference on Big Data.

[27]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[28]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[29]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[30]  Panos Kalnis,et al.  Mizan: a system for dynamic load balancing in large-scale graph processing , 2013, EuroSys '13.

[31]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[32]  David A. Patterson,et al.  Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[34]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[35]  David A. Bader,et al.  A Fast Algorithm for Streaming Betweenness Centrality , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[36]  Ming Wu,et al.  Managing Large Graphs on Multi-Cores with Graph Awareness , 2012, USENIX Annual Technical Conference.

[37]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[38]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[39]  D. Patterson,et al.  Searching for a Parent Instead of Fighting Over Children : A Fast Breadth-First Search Implementation for Graph 500 , 2011 .

[40]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[41]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[42]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[43]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[44]  Katherine A. Yelick,et al.  Optimizing parallel programs with explicit synchronization , 1995, PLDI '95.

[45]  Anthony P. Reeves,et al.  Strategies for Dynamic Load Balancing on Highly Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..

[46]  Eli Upfal,et al.  A simple load balancing scheme for task allocation in parallel machines , 1991, SPAA '91.

[47]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.