Executing SPARQL Queries over the Web of Linked Data

The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.

[1]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[2]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[3]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[4]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[5]  Hamid Pirahesh,et al.  Parallelism in relational data base systems: architectural issues and design approaches , 1990, DPDS '90.

[6]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[7]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[8]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[9]  Lydia B. Chilton,et al.  Tabulator: Exploring and Analyzing linked data on the Semantic Web , 2006 .

[10]  Wolfram Wöß,et al.  A Semantic Web middleware for Virtual Data Integration on the Web , 2008, ESWC.

[11]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[12]  Olaf Hartig,et al.  Linked Data for Building a Map of Researchers , 2009, SFSW@ESWC.

[13]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[14]  Enrico Motta,et al.  Toward a New Generation of Semantic Web Applications , 2008, IEEE Intelligent Systems.

[15]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .