Topic-based indexing of federated datasets

The increasing availability of open datasets calls for the adoption of novel query engines able to ease the retrieval of desired information from this huge amount of data. This paper proposes an in depth analysis of a topic-based indexing solution for building transparent, federated query engines for SPARQL. For the index construction, we assume the retrieval of RDF fragments from the distributed datasets while user queries are re-written by the engine with the support of the SERVICE keyword, recently introduced in SPARQL for distributed queries. To perform rewriting, we propose different strategies for discovering and selecting the dataset endpoints. The strategies are analyzed and compared throughout the paper by exploiting some benchmarks. The tests regard both the effectiveness and performance of the proposed strategies. The results show significant improvements if compared with other techniques.

[1]  Isao Kojima,et al.  ADERIS: An Adaptive Query Processor for Joining Federated SPARQL Endpoints , 2011, OTM Conferences.

[2]  Hugh C. Davis,et al.  LHD: Optimising Linked Data Query Processing Using Parallelisation , 2013, LDOW.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Abraham Bernstein,et al.  Avalanche: Putting the Spirit of the Web back into Semantic Web Querying , 2010, ISWC Posters&Demos.

[5]  Axel-Cyrille Ngonga Ngomo,et al.  Detecting Similar Linked Datasets Using Topic Modelling , 2016, ESWC.

[6]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[7]  In-Chan Choi,et al.  Indexing by Latent Dirichlet Allocation and an Ensemble Model , 2013, J. Assoc. Inf. Sci. Technol..

[8]  Maribel Acosta,et al.  ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints , 2011, SEMWEB.

[9]  Gergo Gombos,et al.  SPARQL Processing over the Linked Open Data with Automatic Endpoint Detection , 2014 .

[10]  Manfred Hauswirth,et al.  DAW: Duplicate-AWare Federated Query Processing over the Web of Data , 2013, SEMWEB.

[11]  Derek Greene,et al.  How Many Topics? Stability Analysis for Topic Models , 2014, ECML/PKDD.

[12]  Stefan Decker,et al.  Cataloguing and Linking Life Sciences LOD Cloud , 2009 .

[13]  Katja Hose,et al.  FedX: A Federation Layer for Distributed Query Processing on Linked Open Data , 2011, ESWC.

[14]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.

[15]  Jaya Sil,et al.  Query Classification using LDA Topic Model and Sparse Representation Based Classifier , 2016, CODS.

[16]  Andriy Nikolov,et al.  FedSearch: Efficiently Combining Structured Queries and Full-Text Search in a SPARQL Federation , 2013, International Semantic Web Conference.

[17]  Muhammad Saleem,et al.  A fine-grained evaluation of SPARQL endpoint federation systems , 2016, Semantic Web.

[18]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[19]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[20]  Eugenio Zimeo,et al.  Querying a complex web-based KB for cultural heritage preservation , 2017, 2017 2nd International Conference on Knowledge Engineering and Applications (ICKEA).