FedSearch: Efficiently Combining Structured Queries and Full-Text Search in a SPARQL Federation

Combining structured queries with full-text search provides a powerful means to access distributed linked data. However, executing hybrid search queries in a federation of multiple data sources presents a number of challenges due to data source heterogeneity and lack of statistical data about keyword selectivity. To address these challenges, we present FedSearch — a novel hybrid query engine based on the SPARQL federation framework FedX. We extend the SPARQL algebra to incorporate keyword search clauses as first-class citizens and apply novel optimization techniques to improve the query processing efficiency while maintaining a meaningful ranking of results. By performing on-the-fly adaptation of the query execution plan and intelligent grouping of query clauses, we are able to reduce significantly the communication costs making our approach suitable for top-k hybrid search across multiple data sources. In experiments we demonstrate that our optimization techniques can lead to a substantial performance improvement, reducing the execution time of hybrid queries by more than an order of magnitude.

[1]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.

[2]  Jeff Heflin,et al.  The Semantic Web – ISWC 2012 , 2012, Lecture Notes in Computer Science.

[3]  Maribel Acosta,et al.  ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints , 2011, SEMWEB.

[4]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[5]  Marcelo Arenas,et al.  Semantics and Complexity of SPARQL , 2006, International Semantic Web Conference.

[6]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[7]  David Hawking,et al.  Merging Results From Isolated Search Engines , 1999, Australasian Database Conference.

[8]  Emanuele Della Valle,et al.  Efficient Execution of Top-K SPARQL Queries , 2012, SEMWEB.

[9]  Maria-Esther Vidal,et al.  Efficiently Joining Group Patterns in SPARQL Queries , 2010, ESWC.

[10]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[11]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[12]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[13]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[14]  Neoklis Polyzotis,et al.  Optimal algorithms for evaluating rank joins in database systems , 2010, TODS.

[15]  Andreas Wagner,et al.  Selectivity estimation for hybrid queries over text-rich data graphs , 2013, EDBT '13.

[16]  Olaf Hartig,et al.  Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution , 2011, ESWC.

[17]  Lora Aroyo,et al.  The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I , 2011, SEMWEB.

[18]  Wolfgang Nejdl,et al.  Benchmarking Fulltext Search Performance of RDF Stores , 2009, ESWC.

[19]  Abraham Bernstein,et al.  Avalanche: Putting the Spirit of the Web back into Semantic Web Querying , 2010, ISWC Posters&Demos.

[20]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[21]  Andreas Harth,et al.  Top-k Linked Data Query Processing , 2012, ESWC.

[22]  Maria-Esther Vidal,et al.  Benchmarking Federated SPARQL Query Engines: Are Existing Testbeds Enough? , 2012, International Semantic Web Conference.

[23]  Haofen Wang,et al.  Lightweight integration of IR and DB for scalable hybrid search with integrated ranking support , 2011, J. Web Semant..

[24]  Luo Si,et al.  A semisupervised learning method to merge search engine results , 2003, TOIS.