论文信息 - FedSearch: Efficiently Combining Structured Queries and Full-Text Search in a SPARQL Federation

FedSearch: Efficiently Combining Structured Queries and Full-Text Search in a SPARQL Federation

Combining structured queries with full-text search provides a powerful means to access distributed linked data. However, executing hybrid search queries in a federation of multiple data sources presents a number of challenges due to data source heterogeneity and lack of statistical data about keyword selectivity. To address these challenges, we present FedSearch a novel hybrid query engine based on the SPARQL federation framework FedX. We extend the SPARQL algebra to incorporate keyword search clauses as first-class citizens and apply novel optimization techniques to improve the query processing efficiency while maintaining a meaningful ranking of results. By performing on-the-fly adaptation of the query execution plan and intelligent grouping of query clauses, we are able to reduce significantly the communication costs making our approach suitable for top-k hybrid search across multiple data sources. In experiments we demonstrate that our optimization techniques can lead to a substantial performance improvement, reducing the execution time of hybrid queries by more than an order of magnitude.

Andriy Nikolov | Christian Hütter | Andreas Schwarte

[1] Günter Ladwig,et al. FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.

[2] Jeff Heflin,et al. The Semantic Web – ISWC 2012 , 2012, Lecture Notes in Computer Science.

[3] Maribel Acosta,et al. ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints , 2011, SEMWEB.

[4] Jun Zhao,et al. Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[5] Marcelo Arenas,et al. Semantics and Complexity of SPARQL , 2006, International Semantic Web Conference.

[6] Lora Aroyo,et al. The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[7] David Hawking,et al. Merging Results From Isolated Search Engines , 1999, Australasian Database Conference.

[8] Emanuele Della Valle,et al. Efficient Execution of Top-K SPARQL Queries , 2012, SEMWEB.

[9] Maria-Esther Vidal,et al. Efficiently Joining Group Patterns in SPARQL Queries , 2010, ESWC.

[10] Ulf Leser,et al. Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[11] Katja Hose,et al. FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[12] Jeff Heflin,et al. LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[13] Donald Kossmann,et al. The state of the art in distributed query processing , 2000, CSUR.

[14] Neoklis Polyzotis,et al. Optimal algorithms for evaluating rank joins in database systems , 2010, TODS.

[15] Andreas Wagner,et al. Selectivity estimation for hybrid queries over text-rich data graphs , 2013, EDBT '13.

[16] Olaf Hartig,et al. Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution , 2011, ESWC.

[17] Lora Aroyo,et al. The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I , 2011, SEMWEB.

[18] Wolfgang Nejdl,et al. Benchmarking Fulltext Search Performance of RDF Stores , 2009, ESWC.

[19] Abraham Bernstein,et al. Avalanche: Putting the Spirit of the Web back into Semantic Web Querying , 2010, ISWC Posters&Demos.

[20] Steffen Staab,et al. SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[21] Andreas Harth,et al. Top-k Linked Data Query Processing , 2012, ESWC.

[22] Maria-Esther Vidal,et al. Benchmarking Federated SPARQL Query Engines: Are Existing Testbeds Enough? , 2012, International Semantic Web Conference.

[23] Haofen Wang,et al. Lightweight integration of IR and DB for scalable hybrid search with integrated ranking support , 2011, J. Web Semant..

[24] Luo Si,et al. A semisupervised learning method to merge search engine results , 2003, TOIS.