G-Store : A Storage Manager for Graph Data

Graph data is ubiquitous: Social networks, Semantic Web, pointer analysis in software engineering, and biological and chemical networks all rely on a graph representation of data. This paper makes the case for a native storage layer for graph data, rather than relying on relational or columnar stores. We propose a lightweight storage manager for graph data called G-Store. It exploits the structure of the graph for placement of data in pages that is optimized for a wide range of access patterns found in graph queries. Our placement approach partitions the data into pages using a multilevel partitioning algorithm and arranges the pages on disk to minimize the distance on disk between adjacent vertices. Initial experiments show that G-Store can outperform existing graph database solutions by orders of magnitude. We believe that these results justify a promising avenue of research into storage-aware graph databases. We discuss some of these research directions.

[1]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[2]  Gang Wu,et al.  System Π: A Native RDF Repository Based on the Hypergraph Representation for RDF Data Model , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[3]  Reuven Bar-Yehuda,et al.  Computing an Optimal Orientation of a Balanced Decomposition Tree for Linear Arrangement Problems , 2001, J. Graph Algorithms Appl..

[4]  Claudio Gutiérrez,et al.  Querying RDF Data from a Graph Database Perspective , 2005, ESWC.

[5]  Fang Wei-Kleiner,et al.  TEDI: Efficient Shortest Path Query Answering on Graphs , 2010, Graph Data Management.

[6]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[7]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[8]  Anton Dries,et al.  Analyzing graph databases by aggregate queries , 2010, MLG '10.

[9]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2010, SIGCOMM '10.

[10]  Roded Sharan,et al.  QNet: A Tool for Querying Protein Interaction Networks , 2007, RECOMB.

[11]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[12]  Stephen A. Edwards,et al.  Flexible pointer analysis using assign-fetch graphs , 2008, SAC '08.

[13]  Abraham Bernstein,et al.  On-disk storage techniques for Semantic Web data-Are B-Trees always the optimal solution ? , 2009 .

[14]  Z M Ozsoyoglu,et al.  Genomic pathways database and biological data management. , 2006, Animal genetics.

[15]  David Harel,et al.  A Multi-scale Algorithm for the Linear Arrangement Problem , 2002, WG.

[16]  Berthier A. Ribeiro-Neto,et al.  Efficient search ranking in social networks , 2007, CIKM '07.

[17]  L. H. Harper Optimal Assignments of Numbers to Vertices , 1964 .

[18]  Marcelo Arenas,et al.  nSPARQL: A Navigational Language for RDF , 2008, SEMWEB.

[19]  Byron Choi,et al.  On incremental maintenance of 2-hop labeling of graphs , 2008, WWW.

[20]  Sherif Sakr,et al.  Querying Graph-Based Repositories of Business Process Models , 2010, DASFAA Workshops.

[21]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[22]  Amit P. Sheth,et al.  SPARQ2L: towards support for subgraph extraction queries in rdf databases , 2007, WWW '07.

[23]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[24]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[25]  Michael Schrefl,et al.  Modelling Inter-Process Dependencies with High-Level Business Process Modelling Languages , 2008, APCCM.

[26]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[27]  Mohammed J. Zaki,et al.  GRAIL , 2010, Proc. VLDB Endow..

[28]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[29]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[30]  Jürgen Umbrich,et al.  YARS2: A Federated Repository for Querying Graph Structured Data from the Web , 2007, ISWC/ASWC.

[31]  Ilya Safro,et al.  Graph minimum linear arrangement by multilevel weighted edge contractions , 2006, J. Algorithms.

[32]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[33]  Jordi Petit,et al.  Combining Spectral Sequencing and Parallel Simulated Annealing for the MINLA Problem , 2003, Parallel Process. Lett..

[34]  Bojan Mohar,et al.  Optimal linear labelings and eigenvalues of graphs , 1992, Discret. Appl. Math..

[35]  Jordi Petit,et al.  Experiments on the minimum linear arrangement problem , 2003, ACM J. Exp. Algorithmics.

[36]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[37]  Wei Jin,et al.  SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs , 2010, Proc. VLDB Endow..

[38]  Jeffrey Xu Yu,et al.  iGraph: A Framework for Comparisons of Disk-Based Graph Indexing Techniques , 2010, Proc. VLDB Endow..

[39]  Ellen R. Bergeman,et al.  Graph database systems , 1995 .