ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints

Following the design rules of Linked Data, the number of available SPARQL endpoints that support remote query processing is quickly growing; however, because of the lack of adaptivity, query executions may frequently be unsuccessful. First, fixed plans identified following the traditional optimize-thenexecute paradigm, may timeout as a consequence of endpoint availability. Second, because blocking operators are usually implemented, endpoint query engines are not able to incrementally produce results, and may become blocked if data sources stop sending data. We present ANAPSID, an adaptive query engine for SPARQL endpoints that adapts query execution schedulers to data availability and run-time conditions. ANAPSID provides physical SPARQL operators that detect when a source becomes blocked or data traffic is bursty, and opportunistically, the operators produce results as quickly as data arrives from the sources. Additionally, ANAPSID operators implement main memory replacement policies to move previously computed matches to secondary memory avoiding duplicates. We compared ANAPSID performance with respect to RDF stores and endpoints, and observed that ANAPSID speeds up execution time, in some cases, in more than one order of magnitude.

[1]  Jeff Heflin,et al.  Using Reformulation Trees to Optimize Queries over Distributed Heterogeneous Sources , 2010, International Semantic Web Conference.

[2]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[3]  Jürgen Umbrich,et al.  Data summaries for on-demand queries over linked data , 2010, WWW '10.

[4]  Óscar Corcho,et al.  Semantics and Optimization of the SPARQL 1.1 Federation Extension , 2011, ESWC.

[5]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[6]  Maria-Esther Vidal,et al.  Efficiently Joining Group Patterns in SPARQL Queries , 2010, ESWC.

[7]  Ian Horrocks,et al.  The Semantic Web – ISWC 2010: 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7-11, 2010, Revised Selected Papers, Part I , 2010, SEMWEB.

[8]  Abraham Bernstein,et al.  The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings , 2009, SEMWEB.

[9]  Ioana Manolescu,et al.  Query optimization in the presence of limited access patterns , 1999, SIGMOD '99.

[10]  Günter Ladwig,et al.  Linked Data Query Processing Strategies , 2010, SEMWEB.

[11]  Abraham Bernstein,et al.  Avalanche: Putting the Spirit of the Web back into Semantic Web Querying , 2010, ISWC Posters&Demos.

[12]  Jeff Z. Pan,et al.  The Semanic Web: Research and Applications - 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29 - June 2, 2011, Proceedings, Part II , 2011, ESWC.

[13]  Lei Zhang,et al.  Summary Models for Routing Keywords to Linked Data Sources , 2010, International Semantic Web Conference.

[14]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[15]  Maria-Esther Vidal,et al.  A sampling-based approach to identify QoS for web service orchestrations , 2010, iiWAS.

[16]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[17]  Günter Ladwig,et al.  SIHJoin: Querying Remote and Local Linked Data , 2011, ESWC.

[18]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[19]  Maria-Esther Vidal,et al.  An Expressive and Efficient Solution to the Service Selection Problem , 2010, SEMWEB.

[20]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[21]  Zachary G. Ives,et al.  Adaptive query processing: Why, How, When, and What Next? , 2007, VLDB.

[22]  Martin L. Kersten,et al.  Self-organizing tuple reconstruction in column-stores , 2009, SIGMOD Conference.

[23]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[24]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[25]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[26]  Manolis Koubarakis,et al.  SPARQL Query Optimization on Top of DHTs , 2010, SEMWEB.

[27]  Olaf Hartig,et al.  Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution , 2011, ESWC.