Querying over Federated SPARQL Endpoints - A State of the Art Survey

The increasing amount of Linked Data and its inherent distributed nature have attracted signicant attention throughout the research community and amongst practitio- ners to search data, in the past years. Inspired by research results from traditional distributed databases, dierent approaches for managing federation over SPARQL Endpoints have been introduced. SPARQL is the standardised query language for RDF, the default data model used in Linked Data deployments and SPARQL Endpoints are a popular access mechanism provided by many Linked Open Data (LOD) repositories. In this paper, we initially give an overview of the federation framework infrastructure and then proceed with a comparison of existing SPARQL federation frameworks. Finally, we highlight shortcomings in existing frameworks, which we hope helps spawning new research directions.

[1]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[2]  Steffen Staab,et al.  SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open Data , 2012, SEMWEB.

[3]  Seán O'Riain,et al.  Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends , 2012, IEEE Internet Computing.

[4]  Carole A. Goble,et al.  Why Linked Data is Not Enough for Scientists , 2010, 2010 IEEE Sixth International Conference on e-Science.

[5]  Jeff Heflin,et al.  The Semantic Web – ISWC 2012 , 2012, Lecture Notes in Computer Science.

[6]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[7]  Olaf Hartig,et al.  Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution , 2011, ESWC.

[8]  Eyal Oren,et al.  Sindice.com: Weaving the Open Linked Data , 2007, ISWC/ASWC.

[9]  Maria-Esther Vidal,et al.  Benchmarking Federated SPARQL Query Engines: Are Existing Testbeds Enough? , 2012, International Semantic Web Conference.

[10]  Lee Feigenbaum,et al.  SCOVO: Using Statistics on the Web of Data , 2009, ESWC.

[11]  Katja Hose,et al.  Processing Rank-Aware Queries in P2P Systems , 2005, DBISP2P.

[12]  Wolfram Wöß,et al.  A Semantic Web middleware for Virtual Data Integration on the Web , 2008, ESWC.

[13]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[14]  Katja Hose,et al.  Learning from the History of Distributed Query Processing - A Heretic View on Linked Data Management , 2012, COLD.

[15]  Maribel Acosta,et al.  A Heuristic-Based Approach for Planning Federated SPARQL Queries , 2012, COLD.

[16]  Abraham Bernstein,et al.  Avalanche: Putting the Spirit of the Web back into Semantic Web Querying , 2010, ISWC Posters&Demos.

[17]  Marko Grobelnik,et al.  Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I , 2011 .

[18]  Katja Hose,et al.  An Experience Report of Large Scale Federations , 2012, ArXiv.

[19]  J. S. Saini,et al.  Adaptive Query Processing , 2006 .

[20]  John Quackenbush,et al.  Standardizing the standards , 2006 .

[21]  Jürgen Umbrich,et al.  Data summaries for on-demand queries over linked data , 2010, WWW '10.

[22]  Norman W. Paton,et al.  The design and implementation of OGSA-DQP: A service-based distributed query processor , 2009, Future Gener. Comput. Syst..

[23]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[24]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[25]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[26]  Nur Aini Rakhmawati,et al.  On the Impact of Data Distribution in Federated SPARQL Queries , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[27]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[28]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[29]  Lee Harland,et al.  Open PHACTS: A Semantic Knowledge Infrastructure for Public and Commercial Drug Discovery Research , 2012, EKAW.

[30]  Günter Ladwig,et al.  Linked Data Query Processing Strategies , 2010, SEMWEB.

[31]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[32]  Robert Isele,et al.  LDIF - A Framework for Large-Scale Linked Data Integration , 2012 .

[33]  Hugh C. Davis,et al.  Evaluating Graph Traversal Algorithms for Distributed SPARQL Query Optimization , 2011, JIST.

[34]  Katja Hose,et al.  Towards benefit-based RDF source selection for SPARQL queries , 2012, SWIM '12.

[35]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[36]  Gerhard Weikum,et al.  Database Foundations for Scalable RDF Processing , 2011, Reasoning Web.

[37]  Steffen Staab,et al.  Networked graphs: a declarative mechanism for SPARQL rules, SPARQL views and RDF data integration on the web , 2008, WWW.

[38]  Guido Moerkotte,et al.  Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[39]  Stefan Decker,et al.  Cataloguing and Linking Life Sciences LOD Cloud , 2009 .

[40]  Subbarao Kambhampati,et al.  Integration of biological sources: current systems and challenges ahead , 2004, SGMD.

[41]  Simon Schenk,et al.  Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-Joins , 2008, SEMWEB.

[42]  Jürgen Umbrich,et al.  Hybrid SPARQL Queries: Fresh vs. Fast Results , 2012, SEMWEB.

[43]  Katja Hose,et al.  FedX: A Federation Layer for Distributed Query Processing on Linked Open Data , 2011, ESWC.

[44]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.

[45]  Steffen Staab,et al.  Federated Data Management and Query Optimization for Linked Open Data , 2011, New Directions in Web Data Management 1.

[46]  Robert Isele,et al.  Silk - Generating RDF Links while Publishing or Consuming Linked Data , 2010, SEMWEB.

[47]  Maribel Acosta,et al.  DEFENDER: A DEcomposer for quEries agaiNst feDERations of Endpoints , 2012, ESWC.

[48]  Wolfram Wöß,et al.  RDFStats - An Extensible RDF Statistics Generator and Library , 2009, 2009 20th International Workshop on Database and Expert Systems Application.

[49]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[50]  Isao Kojima,et al.  Adaptive Integration of Distributed Semantic Web Data , 2010, DNIS.

[51]  Hugh Glaser,et al.  Consuming Multiple Linked Data Sources: Challenges and Experiences , 2010, COLD.