GraphOne: A Data Store for Real-time Analytics on Evolving Graphs

There is a growing need to perform real-time analytics on evolving graphs in order to deliver the values of big data to users. The key requirement from such applications is to have a data store to support their diverse data access efficiently, while concurrently ingesting fine-grained updates at a high velocity. Unfortunately, current graph systems, either graph databases or analytics engines, are not designed to achieve high performance for both operations. To address this challenge, we have designed and developed GRAPHONE, a graph data store that combines two complementary graph storage formats (edge list and adjacency list), and uses dual versioning to decouple graph computations from updates. Importantly, it presents a new data abstraction, GraphView, to enable data access at two different granularities with only a small data duplication. Experimental results show that GRAPHONE achieves an ingestion rate of two to three orders of magnitude higher than graph databases, while delivering algorithmic performance comparable to a static graph system. GRAPHONE is able to deliver 5.36× higher update rate and over 3× better analytics performance compared to a state-ofthe-art dynamic graph system.

[1]  Keval Vora,et al.  GraphBolt: Dependency-Driven Synchronous Processing of Streaming Graphs , 2019, EuroSys.

[2]  Rizal Setya Perdana What is Twitter , 2013 .

[3]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[4]  Udayan Khurana,et al.  Efficient snapshot retrieval over historical graph data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[5]  Michael Isard,et al.  Scalability! But at what COST? , 2015, HotOS.

[6]  Lorie M. Liebrock,et al.  Authentication graphs: Analyzing user behavior within an enterprise network , 2015, Comput. Secur..

[7]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[8]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.

[9]  Virendra J. Marathe,et al.  LLAMA: Efficient graph analytics using Large Multiversioned Arrays , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[10]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[11]  Johannes Gehrke,et al.  Asynchronous Large-Scale Graph Processing Made Easy , 2013, CIDR.

[12]  H. Howie Huang,et al.  iSpan: Parallel Identification of Strongly Connected Components with Spanning Trees , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Monica S. Lam,et al.  Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis , 2013, Proc. VLDB Endow..

[14]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..

[15]  Kimberly Keeton,et al.  LazyBase: trading freshness for performance in a scalable database , 2012, EuroSys '12.

[16]  Doina Caragea,et al.  Graph Databases , 2019, Encyclopedia of Big Data Technologies.

[17]  Rajiv Gupta,et al.  KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations , 2017, ASPLOS.

[18]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[19]  Alexander D. Kent,et al.  Comprehensive, Multi-Source Cyber-Security Events Data Set , 2015 .

[20]  Lawrence B. Holder,et al.  A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs , 2015, EDBT.

[21]  Reynold Cheng,et al.  On querying historical evolving graph sequences , 2011, Proc. VLDB Endow..

[22]  Rajgopal Kannan,et al.  GPOP: a cache and memory-efficient framework for graph processing over partitions , 2018, PPoPP.

[23]  Lada A. Adamic,et al.  Internet: Growth dynamics of the World-Wide Web , 1999, Nature.

[24]  David A. Patterson,et al.  Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[25]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[26]  H. Howie Huang,et al.  G-Store: High-Performance Graph Store for Trillion-Edge Processing , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[27]  Jimeng Sun,et al.  GBASE: a scalable and general graph management system , 2011, KDD.

[28]  Alexander S. Szalay,et al.  FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs , 2014, FAST.

[29]  H. Howie Huang,et al.  Graphene: Fine-Grained IO Management for Graph Computing , 2017, FAST.

[30]  Chris Arney,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Easley, D. and Kleinberg, J.; 2010) [Book Review] , 2013, IEEE Technology and Society Magazine.

[31]  Wei Zhang,et al.  IOGP: An Incremental Online Graph Partitioning Algorithm for Distributed Graph Databases , 2017, HPDC.

[32]  Sebastian Rudolph,et al.  EP-SPARQL: a unified language for event processing and stream reasoning , 2011, WWW.

[33]  Haibo Chen,et al.  NUMA-aware graph-structured analytics , 2015, PPoPP.

[34]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[35]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[36]  David A. Bader,et al.  A performance evaluation of open source graph databases , 2014, PPAA '14.

[37]  Bin Cui,et al.  Tornado: A System For Real-Time Iterative Analysis Over Evolving Data , 2016, SIGMOD Conference.

[38]  Wenguang Chen,et al.  Chronos: a graph engine for temporal graph analysis , 2014, EuroSys '14.

[39]  Ippokratis Pandis,et al.  ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads , 2016, SIGMOD Conference.

[40]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[41]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[42]  Zhe Wu,et al.  Using Domain-Specific Languages For Analytic Graph Databases , 2016, Proc. VLDB Endow..

[43]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[44]  Weimin Zheng,et al.  Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O , 2017, USENIX Annual Technical Conference.

[45]  Willy Zwaenepoel,et al.  Chaos: scale-out graph processing from secondary storage , 2015, SOSP.

[46]  Ion Stoica,et al.  Time-evolving graph processing at scale , 2016, GRADES '16.

[47]  John Scott What is social network analysis , 2010 .

[48]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[49]  Julian Shun,et al.  Low-latency graph streaming using compressed purely-functional trees , 2019, PLDI.

[50]  Willy Zwaenepoel,et al.  Everything you always wanted to know about multicore graph processing but were afraid to ask , 2017, USENIX Annual Technical Conference.

[51]  Jie Yao,et al.  GraPU: Accelerate Streaming Graph Analysis through Preprocessing Buffered Updates , 2018, SoCC.

[52]  H. Howie Huang,et al.  iBFS: Concurrent Breadth-First Search on GPUs , 2016, SIGMOD Conference.

[53]  H. Howie Huang,et al.  Falcon: Scaling IO Performance in Multi-SSD Volumes , 2017, USENIX Annual Technical Conference.

[54]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[55]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[56]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[57]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[58]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[59]  Chengcui Zhang,et al.  GraphD: Distributed Vertex-Centric Graph Processing Beyond the Memory Limit , 2018, IEEE Transactions on Parallel and Distributed Systems.

[60]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[61]  Michael Isard,et al.  Differential Dataflow , 2013, CIDR.

[62]  Peter J. Haas,et al.  Dynamic interaction graphs with probabilistic edge decay , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[63]  H. Howie Huang,et al.  TriX: Triangle counting at extreme scale , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[64]  Stephan Günnemann,et al.  Automatic Algorithm Transformation for Efficient Multi-Snapshot Analytics on Temporal Graphs , 2017, Proc. VLDB Endow..

[65]  H. Howie Huang,et al.  TriCore: Parallel Triangle Counting on GPUs , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[66]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[67]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[68]  Maya Gokhale,et al.  Graph Colouring as a Challenge Problem for Dynamic Graph Processing on Distributed Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[69]  David A. Bader,et al.  STINGER: High performance data structure for streaming graphs , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[70]  Wei Zhang,et al.  AKIN: A Streaming Graph Partitioning Algorithm for Distributed Graph Storage Systems , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[71]  Wenguang Chen,et al.  ShenTu: Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[72]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[73]  Mohan Kumar,et al.  Mosaic: Processing a Trillion-Edge Graph on a Single Machine , 2017, EuroSys.

[74]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[75]  Jinha Kim,et al.  TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC , 2013, KDD.

[76]  Haibo Chen,et al.  Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data , 2017, SOSP.

[77]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[78]  Wencong Xiao,et al.  GraM: scaling graph computation to the trillions , 2015, SoCC.

[79]  H. Howie Huang,et al.  CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching , 2019, SIGMOD Conference.

[80]  Enhong Chen,et al.  Kineograph: taking the pulse of a fast-changing and connected world , 2012, EuroSys '12.

[81]  Alexander D. Kent,et al.  Unified Host and Network Data Set , 2017, Security Science and Technology.

[82]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[83]  H. Howie Huang,et al.  Enterprise: breadth-first graph traversal on GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[84]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[85]  Kang G. Shin,et al.  Version Traveler: Fast and Memory-Efficient Version Switching in Graph Processing Systems , 2016, USENIX Annual Technical Conference.

[86]  H. Howie Huang,et al.  SafeNVM: A Non-Volatile Memory Store with Thread-Level Page Protection , 2017, 2017 IEEE International Congress on Big Data (BigData Congress).

[87]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[88]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.