Managing Structured and Semistructured RDF Data Using Structure Indexes

We propose the use of a structure index for RDF. It can be used for querying RDF data for which the schema is incomplete or not available. More importantly, we leverage it for a structure-oriented approach to RDF data partitioning and query processing. Based on information captured by the structure index, similarly structured data elements are physically grouped and stored contiguously on disk. At querying time, the index is used for "structure-level" processing to identify the groups of data that match the query structure. Structure-level processing is then combined with standard "data-level" operations that involve retrieval and join procedures executed against the data. In the experiment, our solution provides several times faster performance than a state-of-the-art technique for data partitioning and query processing, and compares favorably with full-fledged RDF stores.

[1]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[2]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[3]  Guido Moerkotte,et al.  Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[4]  Jeffrey F. Naughton,et al.  Updates for Structure Indexes , 2002, VLDB.

[5]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[6]  Alon Y. Halevy,et al.  Indexing dataspaces , 2007, SIGMOD '07.

[7]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[8]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[9]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[10]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[11]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[12]  V. S. Subrahmanian,et al.  GRIN: A Graph Based RDF Index , 2007, AAAI.

[13]  Bo Hu,et al.  Path Queries Based RDF Index , 2005, 2005 First International Conference on Semantics, Knowledge and Grid.

[14]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[15]  Mariano P. Consens,et al.  ExpLOD: Summary-Based Exploration of Interlinking and RDF Usage in the Linked Open Data Cloud , 2010, ESWC.

[16]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[17]  Nieves R. Brisaboa,et al.  Compressed k2-Triples for Full-In-Memory RDF Engines , 2011, AMCIS.

[18]  Jean-Claude Fernandez,et al.  An Implementation of an Efficient Algorithm for Bisimulation Equivalence , 1990, Sci. Comput. Program..

[19]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[20]  Robert E. Tarjan,et al.  Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[21]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[22]  Gerhard Weikum,et al.  x-RDF-3X , 2010, Proc. VLDB Endow..

[23]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[24]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[25]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[26]  Dave Reynolds,et al.  Efficient RDF Storage and Retrieval in Jena2 , 2003, SWDB.

[27]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[28]  Chengfei Liu,et al.  Estimating selectivity for joined RDF triple patterns , 2011, CIKM '11.

[29]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[30]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.