Selecting Materialized Views for RDF Data

In the design of a relational database, the administrator has to decide, given a fixed or estimated workload, which indexes should be created. This so called index selection problem is an non-trivial optimization problem in relational databases. In this paper we describe a novel approach for index selection on RDF data sets. We propose an algorithm to automatically suggest a set of indexes as materialized views based on a workload of SPARQL queries. The selected set of indexes aims to decrease the cost of the workload. We provide a cost model to estimate the potential impact of candidate indexes on query performance and an algorithm to select an optimal set of indexes. This algorithm may be integrated into an existing SPARQL query engine. We experimentally evaluate our approach on a standard query processor. We claim that our approach is the first comprehensive suggestion for the index selection problem in RDF.

[1]  Sebastian Ebers,et al.  Efficient processing of SPARQL joins in memory by dynamically restricting triple patterns , 2009, SAC '09.

[2]  Holger Stenzhorn,et al.  Simplifying Access to Large-Scale Health Care and Life Sciences Datasets , 2008, ESWC.

[3]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[4]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[5]  Roger Castillo RDFMatView : Indexing RDF Data for SPARQL Queries , 2010 .

[6]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[7]  Volker Linnemann,et al.  Using an index of precomputed joins in order to speed up SPARQL processing , 2007, ICEIS.

[8]  Toshiyuki Amagasa,et al.  An Indexing Scheme for RDF and RDF Schema based on Suffix Arrays , 2003, SWDB.

[9]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[10]  Sven Groppe,et al.  Optimization of SPARQL by using coreSPARQL , 2009, ICEIS.

[11]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[12]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[13]  Douglas Comer,et al.  The difficulty of optimum index selection , 1978, TODS.

[14]  V. S. Subrahmanian,et al.  GRIN: A Graph Based RDF Index , 2007, AAAI.

[15]  Matteo Fischetti,et al.  Exact and Approximate Algorithms for the Index Selection Problem in Physical Database Design , 1995, IEEE Trans. Knowl. Data Eng..

[16]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[17]  Nesa L'abbe Wu,et al.  Linear programming and extensions , 1981 .

[18]  Olaf Hartig,et al.  The SPARQL Query Graph Model for Query Optimization , 2007, ESWC.

[19]  Philip S. Yu,et al.  Graph indexing based on discriminative frequent structure analysis , 2005, TODS.

[20]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.