GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs

Efficient streaming graph processing systems leverage incremental processing by updating computed results to reflect the change in graph structure for the latest graph snapshot. Although certain monotonic path-based algorithms produce correct results by refining intermediate values via numerical comparisons, directly reusing values that were computed before mutation does not work correctly for algorithms that require BSP semantics. Since structural mutations in streaming graphs render the intermediate results unusable, exploiting incremental computation while simultaneously providing synchronous processing guarantees is challenging. In this paper we develop GraphBolt which incrementally processes streaming graphs while guaranteeing BSP semantics. GraphBolt incorporates dependency-driven incremental processing where it first tracks dependencies to capture how intermediate values get computed, and then uses this information to incrementally propagate the impact of change across intermediate values. To support wide variety of graph-based analytics, GraphBolt provides a generalized incremental programming model that enables development of incremental versions of complex aggregations. Our evaluation shows that GraphBolt's incremental processing eliminates redundant computations and efficiently processes streaming graphs with varying mutation rates, starting from just a single edge mutation all the way up to 1 million edge mutations at a time. Furthermore, being specialized for graph computations, GraphBolt extracts high performance compared to Differential Dataflow.

[1]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[2]  Tore Risch,et al.  Massive scale-out of expensive continuous queries , 2011, Proc. VLDB Endow..

[3]  Reynold Cheng,et al.  On querying historical evolving graph sequences , 2011, Proc. VLDB Endow..

[4]  Sofya Vorotnikova,et al.  Better Algorithms for Counting Triangles in Data Streams , 2016, PODS.

[5]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[6]  Rajiv Gupta,et al.  Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing , 2016, USENIX Annual Technical Conference.

[7]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[8]  David A. Bader,et al.  Massive streaming data analytics: A case study with clustering coefficients , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[9]  Pramod Bhatotia,et al.  Slider: incremental sliding window analytics , 2014, Middleware.

[10]  Srikanta Tirthapura,et al.  Parallel triangle counting in massive streaming graphs , 2013, CIKM.

[11]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[12]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[13]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Willy Zwaenepoel,et al.  Chaos: scale-out graph processing from secondary storage , 2015, SOSP.

[16]  Ion Stoica,et al.  Time-evolving graph processing at scale , 2016, GRADES '16.

[17]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[18]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[19]  David A. Bader,et al.  STINGER: High performance data structure for streaming graphs , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[20]  Michael Stonebraker,et al.  Retrospective on Aurora , 2004, The VLDB Journal.

[21]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[22]  Michael Isard,et al.  Differential Dataflow , 2013, CIDR.

[23]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[24]  Rajiv Gupta,et al.  ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM , 2014, OOPSLA.

[25]  Frank Wm. Tompa,et al.  Efficiently updating materialized views , 1986, SIGMOD '86.

[26]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[27]  Christos Faloutsos,et al.  Inference of Beliefs on Billion-Scale Graphs , 2010 .

[28]  Gustavo Alonso,et al.  Augmented Sketch: Faster and More Accurate Stream Processing , 2016, SIGMOD Conference.

[29]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[30]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[31]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[32]  Frank McSherry,et al.  A uniform approach to accelerated PageRank computation , 2005, WWW '05.

[33]  Rajiv Gupta,et al.  KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations , 2017, ASPLOS.

[34]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[35]  Enhong Chen,et al.  Kineograph: taking the pulse of a fast-changing and connected world , 2012, EuroSys '12.

[36]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[37]  Limin Jia,et al.  Maintaining distributed logic programs incrementally , 2011, Comput. Lang. Syst. Struct..

[38]  Rajiv Gupta,et al.  Synergistic Analysis of Evolving Graphs , 2016, ACM Trans. Archit. Code Optim..

[39]  Karsten Schwan,et al.  GraphIn: An Online High Performance Incremental Graph Processing Framework , 2016, Euro-Par.

[40]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[41]  Bin Cui,et al.  Tornado: A System For Real-Time Iterative Analysis Over Evolving Data , 2016, SIGMOD Conference.

[42]  Wenguang Chen,et al.  Chronos: a graph engine for temporal graph analysis , 2014, EuroSys '14.

[43]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[44]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[45]  Arie Shoshani,et al.  Enabling Real-Time Querying of Live and Historical Stream Data , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[46]  Jaideep Srivastava,et al.  Incremental page rank computation on evolving graphs , 2005, WWW '05.

[47]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[48]  Toyotaro Suzumura,et al.  Towards large-scale graph stream processing platform , 2014, WWW.

[49]  Haifeng Jiang,et al.  Photon: fault-tolerant and scalable joining of continuous data streams , 2013, SIGMOD '13.

[50]  Wencong Xiao,et al.  GraM: scaling graph computation to the trillions , 2015, SoCC.