Controlling Memory Footprint of Stateful Streaming Graph Processing

With growing interest in efficiently analyzing dynamic graphs, streaming graph processing systems rely on stateful iterative models where they track the intermediate state as execution progresses in order to incrementally adjust the results upon graph mutation. We observe that the intermediate state tracked by these stateful iterative models significantly increases the memory footprint of these systems, which limits their scalability on large graphs. In this paper, we develop memory-efficient stateful iterative models that demand much less memory capacity to efficiently process streaming graphs and deliver the same results as provided by existing stateful iterative models. First, we propose a Selective Stateful Iterative Model where the memory footprint is controlled by selecting a small portion of the intermediate state to be maintained throughout execution. Then, we propose a Minimal Stateful Iterative Model that further reduces the memory footprint by exploiting key properties of graph algorithms. We develop incremental processing strategies for both of our models in order to correctly compute the effects of graph mutations on the final results even when intermediate states are not available. Evaluation shows our memory-efficient models are effective in limiting the memory footprint while still retaining most of the performance benefits of traditional stateful iterative models, hence being able to scale on larger graphs that could not be handled by the traditional models.

[1]  Bin Cui,et al.  Tornado: A System For Real-Time Iterative Analysis Over Evolving Data , 2016, SIGMOD Conference.

[2]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[3]  David A. Bader,et al.  STINGER: High performance data structure for streaming graphs , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[4]  Wenguang Chen,et al.  ImmortalGraph: A System for Storage and Analysis of Temporal Graphs , 2015, TOS.

[5]  Enhong Chen,et al.  Kineograph: taking the pulse of a fast-changing and connected world , 2012, EuroSys '12.

[6]  Wei-Ying Ma,et al.  Graph based multi-modality learning , 2005, ACM Multimedia.

[7]  Wenguang Chen,et al.  LiveGraph , 2019, Proc. VLDB Endow..

[8]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[9]  Keval Vora,et al.  CuSha: vertex-centric graph processing on GPUs , 2014, HPDC '14.

[10]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.

[11]  Yang Wang,et al.  Multi-Manifold Ranking: Using Multiple Features for Better Image Retrieval , 2013, PAKDD.

[12]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[13]  Zhuhua Cai,et al.  Facilitating real-time graph mining , 2012, CloudDB '12.

[14]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[15]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[16]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[17]  Karsten Schwan,et al.  GraphIn: An Online High Performance Incremental Graph Processing Framework , 2016, Euro-Par.

[18]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[19]  Wenguang Chen,et al.  ShenTu: Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[21]  Ion Stoica,et al.  Time-evolving graph processing at scale , 2016, GRADES '16.

[22]  Sudipto Guha,et al.  REX: Recursive, Delta-Based Data-Centric Computation , 2012, Proc. VLDB Endow..

[23]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[24]  Rajiv Gupta,et al.  ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM , 2014, OOPSLA.

[25]  Rajiv Gupta,et al.  KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations , 2017, ASPLOS.

[26]  H. Howie Huang,et al.  GraphOne: A Data Store for Real-time Analytics on Evolving Graphs , 2020, FAST.

[27]  Toyotaro Suzumura,et al.  Towards large-scale graph stream processing platform , 2014, WWW.

[28]  Rajiv Gupta,et al.  Synergistic Analysis of Evolving Graphs , 2016, ACM Trans. Archit. Code Optim..

[29]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[30]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[31]  Willy Zwaenepoel,et al.  Chaos: scale-out graph processing from secondary storage , 2015, SOSP.

[32]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[33]  Keval Vora,et al.  LUMOS: Dependency-Driven Disk-based Graph Processing , 2019, USENIX ATC.

[34]  Wenguang Chen,et al.  Chronos: a graph engine for temporal graph analysis , 2014, EuroSys '14.

[35]  Julian Shun,et al.  Low-latency graph streaming using compressed purely-functional trees , 2019, PLDI.

[36]  Keval Vora,et al.  GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs , 2019, EuroSys.

[37]  Keval Vora,et al.  DZiG: sparsity-aware incremental processing of streaming graphs , 2021, EuroSys.

[38]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[39]  Wencong Xiao,et al.  GraM: scaling graph computation to the trillions , 2015, SoCC.

[40]  Virendra J. Marathe,et al.  LLAMA: Efficient graph analytics using Large Multiversioned Arrays , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[41]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[42]  Lixin Gao,et al.  Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation , 2017, 1710.05785.

[43]  Michael Isard,et al.  Differential Dataflow , 2013, CIDR.