DISTINGER: A distributed graph data structure for massive dynamic graph processing

Large and dynamic graphs with streaming updates have been gaining traction recently, along with the need for enabling graph analytics in a commodity cluster instead of a high-performance computing facility. Surprisingly, there is a lack of study on scaling out graph data structures to represent sparse dynamic graphs in a commodity cluster, and even the latest work [1] based upon the most common in-memory graph representation CSR [2] is a single-machine case. In this paper we present DISTINGER, a distributed graph representation that handles massive graph analytics with streaming updates. DISTINGER successfully extends a scale-up design to a scale-out graph data structure while maintains its efficiency and scalability. We implement our design and algorithms as a prototype, and compare it to single-site STINGER and state-of-art graph systems. Our experimental evaluation in a real cluster shows that DISTINGER can handle larger graphs than STINGER, and perform graph tasks (PageRank and edge updates) more efficiently than GraphLab and Giraph.

[1]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[2]  David A. Bader,et al.  A performance evaluation of open source graph databases , 2014, PPAA '14.

[3]  M. Tamer Özsu,et al.  An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..

[4]  Amol Deshpande,et al.  Managing large dynamic graphs efficiently , 2012, SIGMOD Conference.

[5]  Liangjie Hong,et al.  A time-dependent topic model for multiple text streams , 2011, KDD.

[6]  Josep-Lluís Larriba-Pey,et al.  Efficient graph management based on bitmap indices , 2012, IDEAS '12.

[7]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[8]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[9]  Shirish Tatikonda,et al.  From "Think Like a Vertex" to "Think Like a Graph" , 2013, Proc. VLDB Endow..

[10]  Panos Kalnis,et al.  Mizan: a system for dynamic load balancing in large-scale graph processing , 2013, EuroSys '13.

[11]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[12]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[13]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[14]  Hamid Sarbazi-Azad,et al.  Computational Graph Analytics for Massive Streaming Data , 2014 .

[15]  Virendra J. Marathe,et al.  LLAMA: Efficient graph analytics using Large Multiversioned Arrays , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[16]  José Martins,et al.  TwitterEcho: a distributed focused crawler to support open research with twitter data , 2012, WWW.

[17]  David A. Bader,et al.  STINGER : Spatio-Temporal Interaction Networks and Graphs ( STING ) Extensible Representation , 2009 .

[18]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[19]  Peter Macko,et al.  LLAMA: A Persistent, Mutable Representation for Graphs , 2015 .

[20]  David A. Bader,et al.  Computational Graph Analytics for Massive Streaming Data , 2011 .

[21]  David A. Bader,et al.  STINGER: High performance data structure for streaming graphs , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[22]  Yogesh L. Simmhan,et al.  GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics , 2013, Euro-Par.