A Structural Approach to Indexing Triples

As an essential part of the W3C's semantic web stack and linked data initiative, RDF data management systems (also known as triplestores) have drawn a lot of research attention. The majority of these systems use value-based indexes (e.g., B+-trees) for physical storage, and ignore many of the structural aspects present in RDF graphs. Structural indexes, on the other hand, have been successfully applied in XML and semi-structured data management to exploit structural graph information in query processing. In those settings, a structural index groups nodes in a graph based on some equivalence criterion, for example, indistinguishability with respect to some query workload (usually XPath). Motivated by this body of work, we have started the SAINT-DB project to study and develop a native RDF management system based on structural indexes. In this paper we present a principled framework for designing and using RDF structural indexes for practical fragments of SPARQL, based on recent formal structural characterizations of these fragments. We then explain how structural indexes can be incorporated in a typical query processing workflow; and discuss the design, implementation, and initial empirical evaluation of our approach.

[1]  W. Marsden I and J , 2012 .

[2]  Lora Aroyo,et al.  The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I , 2011, SEMWEB.

[3]  Malcolm P. Atkinson,et al.  Issues Raised by Three Years of Developing PJama: An Orthogonally Persistent Platform for Java , 1999, ICDT.

[4]  Abraham Bernstein,et al.  The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings , 2009, SEMWEB.

[5]  Jan Hidders,et al.  On guarded simulations and acyclic first-order languages , 2011, DBPL.

[6]  Sofía Brenes,et al.  Structural summaries for efficient XML query processing , 2008, Ph.D. '08.

[7]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[8]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[9]  Philippe Cudré-Mauroux,et al.  dipLODocus[RDF] - Short and Long-Tail RDF Analytics for Massive Webs of Data , 2011, SEMWEB.

[10]  Rob J. van Glabbeek,et al.  Correcting a Space-Efficient Simulation Algorithm , 2008, CAV.

[11]  Thanh Tran Structure Index for RDF Data , 2010 .

[12]  Stijn Vansummeren,et al.  What are real SPARQL queries like? , 2011, SWIM '11.

[13]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[14]  George H. L. Fletcher,et al.  Scalable indexing of RDF graphs for efficient join processing , 2009, CIKM.

[15]  George H. L. Fletcher,et al.  A methodology for coupling fragments of XPath with structural indexes for XML documents , 2007, Inf. Syst..

[16]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[17]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[18]  Jan Hidders,et al.  Storing and Indexing Massive RDF Datasets , 2012, Semantic Search over the Web.

[19]  Carla Piazza,et al.  From Bisimulation to Simulation: Coarsest Partition Problems , 2003, Journal of Automated Reasoning.

[20]  V. S. Subrahmanian,et al.  GRIN: A Graph Based RDF Index , 2007, AAAI.

[21]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[22]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[23]  Pablo de la Fuente,et al.  An Empirical Study of Real-World SPARQL Queries , 2011, ArXiv.

[24]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[25]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[26]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[27]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[28]  V. S. Subrahmanian,et al.  DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases , 2009, SEMWEB.