LLAMA: Efficient graph analytics using Large Multiversioned Arrays

We present LLAMA, a graph storage and analysis system that supports mutability and out-of-memory execution. LLAMA performs comparably to immutable main-memory analysis systems for graphs that fit in memory and significantly outperforms existing out-of-memory analysis systems for graphs that exceed main memory. LLAMA bases its implementation on the compressed sparse row (CSR) representation, which is a read-only representation commonly used for graph analytics. We augment this representation to support mutability and persistence using a novel implementation of multi-versioned array snapshots, making it ideal for applications that receive a steady stream of new data, but need to perform whole-graph analysis on consistent views of the data. We compare LLAMA to state-of-the-art systems on representative graph analysis workloads, showing that LLAMA scales well both out-of-memory and across parallel cores. Our evaluation shows that LLAMA's mutability introduces modest overheads of 3–18% relative to immutable CSR for in-memory execution and that it outperforms state-of-the-art out-of-memory systems in most cases, with a best case improvement of 5x on breadth-first-search.

[1]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[2]  Ming Wu,et al.  Managing Large Graphs on Multi-Cores with Graph Awareness , 2012, USENIX Annual Technical Conference.

[3]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[4]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[5]  Josep-Lluís Larriba-Pey,et al.  Efficient graph management based on bitmap indices , 2012, IDEAS '12.

[6]  Andrei Z. Broder,et al.  The Connectivity Server: Fast Access to Linkage Information on the Web , 1998, Comput. Networks.

[7]  Wenguang Chen,et al.  Chronos: a graph engine for temporal graph analysis , 2014, EuroSys '14.

[8]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[9]  Lei Yang,et al.  Link analysis using time series of web graphs , 2007, CIKM '07.

[10]  Christos Faloutsos,et al.  PEGASUS: mining peta-scale graphs , 2011, Knowledge and Information Systems.

[11]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[12]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[13]  David A. Bader,et al.  STINGER : Spatio-Temporal Interaction Networks and Graphs ( STING ) Extensible Representation , 2009 .

[14]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[15]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[16]  Duen Horng Chau,et al.  Leveraging memory mapping for fast and scalable graph computation on a PC , 2013, 2013 IEEE International Conference on Big Data.

[17]  Alexander S. Szalay,et al.  A Parallel Page Cache: IOPS and Caching for Multicore Systems , 2012, HotStorage.

[18]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[19]  Udayan Khurana,et al.  Efficient snapshot retrieval over historical graph data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[20]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[21]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[22]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[23]  S. Sudarshan,et al.  Incremental Organization for Data Recording and Warehousing , 1997, VLDB.

[24]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[25]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[26]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[27]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[28]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[29]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[30]  Jinha Kim,et al.  TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC , 2013, KDD.