DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases

RDF is an increasingly important paradigm for the representation of information on the Web. As RDF databases increase in size to approach tens of millions of triples, and as sophisticated graph matching queries expressible in languages like SPARQL become increasingly important, scalability becomes an issue. To date, there is no graph-based indexing method for RDF data where the index was designed in a way that makes it disk-resident. There is therefore a growing need for indexes that can operate efficiently when the index itself resides on disk. In this paper, we first propose the DOGMA index for fast subgraph matching on disk and then develop a basic algorithm to answer queries over this index. This algorithm is then significantly sped up via an optimized algorithm that uses efficient (but correct) pruning strategies when combined with two different extensions of the index. We have implemented a preliminary system and tested it against four existing RDF database systems developed by others. Our experiments show that our algorithm performs very well compared to these systems, with orders of magnitude improvements for complex graph queries.

[1]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[2]  Dave Reynolds,et al.  Efficient RDF Storage and Retrieval in Jena2 , 2003, SWDB.

[3]  V. S. Subrahmanian,et al.  GRIN: A Graph Based RDF Index , 2007, AAAI.

[4]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[5]  Michael Sintek,et al.  RDFBroker: A Signature-Based High-Performance RDF Store , 2006, ESWC.

[6]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[7]  Roy Goldman,et al.  From semistructured data to XML , 2000 .

[8]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[9]  Henry Lieberman,et al.  Sesame: An Architecture for Storing and Querying RDF Data and Schema Information , 2005 .

[10]  Vassilis Christophides,et al.  Benchmarking Database Representations of RDF/S Stores , 2005, SEMWEB.

[11]  Sang-goo Lee,et al.  A comparison of ontology reasoning systems using query sequences , 2008, ICUIMC '08.

[12]  Atanas Kiryakov,et al.  OWLIM - A Pragmatic Semantic Repository for OWL , 2005, WISE Workshops.

[13]  Frank van Harmelen,et al.  Sesame: An Architecture for Storin gand Querying RDF Data and Schema Information , 2003, Spinning the Semantic Web.

[14]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[15]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[16]  Roy Goldman,et al.  From Semistructured Data to XML: Migrating the Lore Data Model and Query Language , 1999, Markup Lang..

[17]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[18]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[19]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[20]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..