DZiG: sparsity-aware incremental processing of streaming graphs

State-of-the-art streaming graph processing systems that provide Bulk Synchronous Parallel (BSP) guarantees remain oblivious to the computation sparsity present in iterative graph algorithms, which severely limits their performance. In this paper we propose DZiG, a high-performance streaming graph processing system that retains efficiency in presence of sparse computations while still guaranteeing BSP semantics. At the heart of DZiG is: (1) a sparsity-aware incremental processing technique that expresses computations in a recursive manner to be able to safely identify and prune updates (hence retaining sparsity); (2) a simple change-driven programming model that naturally exposes sparsity in iterative computations; and, (3) an adaptive processing model that automatically changes the incremental computation strategy to limit its overheads when computations become very sparse. DZiG outperforms state-of-the-art streaming graph processing systems, and pushes the boundary of dependency-driven processing for streaming graphs to over 10 million simultaneous mutations, which is orders of magnitude higher compared to the state-of-the-art systems.

[1]  Rajiv Gupta,et al.  Synergistic Analysis of Evolving Graphs , 2016, ACM Trans. Archit. Code Optim..

[2]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[3]  Wenguang Chen,et al.  LiveGraph , 2019, Proc. VLDB Endow..

[4]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[5]  Daan Leijen,et al.  Mimalloc: Free List Sharding in Action , 2019, APLAS.

[6]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[7]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[8]  Wenguang Chen,et al.  ShenTu: Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[10]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[11]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[12]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[13]  Ion Stoica,et al.  Time-evolving graph processing at scale , 2016, GRADES '16.

[14]  Willy Zwaenepoel,et al.  Chaos: scale-out graph processing from secondary storage , 2015, SOSP.

[15]  Keval Vora,et al.  GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs , 2019, EuroSys.

[16]  H. Howie Huang,et al.  G-Store: High-Performance Graph Store for Trillion-Edge Processing , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Wencong Xiao,et al.  GraM: scaling graph computation to the trillions , 2015, SoCC.

[18]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[19]  David A. Bader,et al.  STINGER: High performance data structure for streaming graphs , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[20]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[21]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[22]  Frank Wm. Tompa,et al.  Efficiently updating materialized views , 1986, SIGMOD '86.

[23]  Virendra J. Marathe,et al.  LLAMA: Efficient graph analytics using Large Multiversioned Arrays , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[24]  H. Howie Huang,et al.  GraphOne: A Data Store for Real-time Analytics on Evolving Graphs , 2020, FAST.

[25]  Haifeng Jiang,et al.  Photon: fault-tolerant and scalable joining of continuous data streams , 2013, SIGMOD '13.

[26]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[27]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[28]  Toyotaro Suzumura,et al.  Towards large-scale graph stream processing platform , 2014, WWW.

[29]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[30]  Jaideep Srivastava,et al.  Incremental page rank computation on evolving graphs , 2005, WWW '05.

[31]  Rajiv Gupta,et al.  KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations , 2017, ASPLOS.

[32]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[33]  Rajiv Gupta,et al.  Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing , 2016, USENIX Annual Technical Conference.

[34]  Pramod Bhatotia,et al.  Slider: incremental sliding window analytics , 2014, Middleware.

[35]  Rajiv Gupta,et al.  ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM , 2014, OOPSLA.

[36]  Tore Risch,et al.  Massive scale-out of expensive continuous queries , 2011, Proc. VLDB Endow..

[37]  Michael Isard,et al.  Differential Dataflow , 2013, CIDR.

[38]  Karsten Schwan,et al.  GraphIn: An Online High Performance Incremental Graph Processing Framework , 2016, Euro-Par.

[39]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[40]  Keval Vora,et al.  LUMOS: Dependency-Driven Disk-based Graph Processing , 2019, USENIX ATC.

[41]  Wenguang Chen,et al.  ImmortalGraph: A System for Storage and Analysis of Temporal Graphs , 2015, TOS.

[42]  Bin Cui,et al.  Tornado: A System For Real-Time Iterative Analysis Over Evolving Data , 2016, SIGMOD Conference.

[43]  Jimmy J. Lin,et al.  GraphJet: Real-Time Content Recommendations at Twitter , 2016, Proc. VLDB Endow..

[44]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[45]  Zhuhua Cai,et al.  Facilitating real-time graph mining , 2012, CloudDB '12.

[46]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[47]  Christos Faloutsos,et al.  Inference of Beliefs on Billion-Scale Graphs , 2010 .

[48]  Wenguang Chen,et al.  Chronos: a graph engine for temporal graph analysis , 2014, EuroSys '14.

[49]  Julian Shun,et al.  Low-latency graph streaming using compressed purely-functional trees , 2019, PLDI.

[50]  Michael Stonebraker,et al.  Retrospective on Aurora , 2004, The VLDB Journal.

[51]  Frank McSherry,et al.  A uniform approach to accelerated PageRank computation , 2005, WWW '05.

[52]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[53]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[54]  Gustavo Alonso,et al.  Augmented Sketch: Faster and More Accurate Stream Processing , 2016, SIGMOD Conference.

[55]  Limin Jia,et al.  Maintaining distributed logic programs incrementally , 2011, Comput. Lang. Syst. Struct..

[56]  Jimmy J. Lin,et al.  Real-Time Twitter Recommendation: Online Motif Detection in Large Dynamic Graphs , 2014, Proc. VLDB Endow..

[57]  Sudipto Guha,et al.  REX: Recursive, Delta-Based Data-Centric Computation , 2012, Proc. VLDB Endow..

[58]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[59]  Enhong Chen,et al.  Kineograph: taking the pulse of a fast-changing and connected world , 2012, EuroSys '12.

[60]  Jeremy Chen,et al.  Graphflow: An Active Graph Database , 2017, SIGMOD Conference.

[61]  Arie Shoshani,et al.  Enabling Real-Time Querying of Live and Historical Stream Data , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).