Learning from the History of Distributed Query Processing - A Heretic View on Linked Data Management

The vision of the Semantic Web has triggered the development of various new applications and opened up new directions in research. Recently, much effort has been put into the development of techniques for query processing over Linked Data. Being based upon techniques originally developed for distributed and federated databases, some of them inherit the same or similar problems. Thus, the goal of this paper is to point out pitfalls that the previous generation of researchers has already encountered and to introduce the Linked Data as a Service as an idea that has the potential to solve the problem in some scenarios. Hence, this paper discusses nine theses about Linked Data processing and sketches a research agenda for future endeavors in the area of Linked Data processing.

[1]  Andre Bolles,et al.  Streaming SPARQL - Extending SPARQL to Process Data Streams , 2008, ESWC.

[2]  Laura M. Haas,et al.  Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System , 1999, VLDB.

[3]  Olaf Hartig,et al.  Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution , 2011, ESWC.

[4]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[5]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.

[6]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Björn Þór Jónsson,et al.  Performance tradeoffs for client-server query processing , 1996, SIGMOD '96.

[9]  Daniele Braga,et al.  Querying RDF streams with C-SPARQL , 2010, SGMD.

[10]  Gerhard Weikum,et al.  Database Foundations for Scalable RDF Processing , 2011, Reasoning Web.

[11]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[12]  Jürgen Umbrich,et al.  Data summaries for on-demand queries over linked data , 2010, WWW '10.

[13]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[14]  Amit P. Sheth,et al.  SPARQL-ST: Extending SPARQL to Support Spatiotemporal Queries , 2011, Geospatial Semantics and the Semantic Web.

[15]  Volker Linnemann,et al.  A SPARQL Engine for Streaming RDF Data , 2007, 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System.

[16]  Peter Haase,et al.  An evaluation of approaches to federated query processing over linked data , 2010, I-SEMANTICS '10.

[17]  Ioannis Konstantinou,et al.  H2RDF: adaptive query processing on RDF data in the cloud. , 2012, WWW.

[18]  Wolfram Wöß,et al.  A Semantic Web middleware for Virtual Data Integration on the Web , 2008, ESWC.

[19]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[20]  Peter A. Boncz Main Memory DBMS , 2009, Encyclopedia of Database Systems.

[21]  Goetz Graefe Parallel Query Execution Algorithms , 2009, Encyclopedia of Database Systems.

[22]  Katja Hose,et al.  Colledge: a vision of collaborative knowledge networks , 2012, SSW '12.

[23]  Yon Dohn Chung,et al.  SPIDER: a system for scalable, parallel / distributed evaluation of large-scale RDF data , 2009, CIKM.

[24]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[25]  Óscar Corcho,et al.  Semantics and Optimization of the SPARQL 1.1 Federation Extension , 2011, ESWC.

[26]  Mary Roth,et al.  Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources , 1997, VLDB.

[27]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[28]  Bhavani M. Thuraisingham,et al.  Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing , 2011, IEEE Transactions on Knowledge and Data Engineering.

[29]  Parag Agrawal,et al.  The case for RAMCloud , 2011, Commun. ACM.

[30]  Goetz Graefe,et al.  Encapsulation of parallelism in the Volcano query processing system , 1990, SIGMOD '90.

[31]  Günter Ladwig,et al.  SIHJoin: Querying Remote and Local Linked Data , 2011, ESWC.

[32]  Alasdair J. G. Gray,et al.  Enabling Ontology-Based Access to Streaming Data Sources , 2010, SEMWEB.

[33]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[34]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[35]  Michael J. Franklin,et al.  Cache investment: integrating query optimization and distributed data placement , 2000, TODS.

[36]  Danh Le Phuoc,et al.  A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data , 2011, SEMWEB.

[37]  Martin L. Kersten,et al.  Breaking the memory wall in MonetDB , 2008, CACM.