Enhancing Source Selection for Live Queries over Linked Data via Query Log Mining

Traditionally, Linked Data query engines execute SPARQL queries over a materialised repository which on the one hand, guarantees fast query answering but on the other hand requires time and resource consuming preprocessing steps. In addition, the materialised repositories have to deal with the ongoing challenge of maintaining the index which is --- given the size of the Web --- practically unfeasible. Thus, the results for a given SPARQL query are potentially out-dated. Recent approaches address the result freshness problem by answering a given query directly over dereferenced query relevant Web documents. Our work investigate the problem of an efficient selection of query relevant sources under this context. As a part of query optimization, source selection tries to estimate the minimum number of sources accessed in order to answer a query. We propose to summarize and index sources based on frequently appearing query graph patterns mined from query logs. We verify the applicability of our approach and empirically show that our approach significantly reduces the number of relevant sources estimated while keeping the overhead low.

[1]  Frank Huber,et al.  A Main Memory Index Structure to Query Linked Data , 2011, LDOW.

[2]  Willie Ng,et al.  Discovery of Frequent Patterns in Transactional Data Streams , 2010, Trans. Large Scale Data Knowl. Centered Syst..

[3]  Jürgen Umbrich,et al.  LDspider: An Open-source Crawling Framework for the Web of Linked Data , 2010, SEMWEB.

[4]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[5]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[6]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Jürgen Umbrich,et al.  Data summaries for on-demand queries over linked data , 2010, WWW '10.

[8]  Jürgen Umbrich,et al.  Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources , 2010, LDOW.

[9]  Peter Haase,et al.  An evaluation of approaches to federated query processing over linked data , 2010, I-SEMANTICS '10.

[10]  Anna Lubiw,et al.  Some NP-Complete Problems Similar to Graph Isomorphism , 1981, SIAM J. Comput..

[11]  Ian Horrocks,et al.  The Semantic Web – ISWC 2010: 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7-11, 2010, Revised Selected Papers, Part I , 2010, SEMWEB.

[12]  Olaf Hartig,et al.  Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution , 2011, ESWC.

[13]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[14]  Abraham Bernstein,et al.  The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings , 2009, SEMWEB.

[15]  Günter Ladwig,et al.  Linked Data Query Processing Strategies , 2010, SEMWEB.

[16]  Paulius Micikevicius,et al.  A New Encoding for Labeled Trees Employing a Stack and a Queue , 2002 .

[17]  E. H. Neville,et al.  The codifying of tree-structure , 1953, Mathematical Proceedings of the Cambridge Philosophical Society.

[18]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[19]  Tim Berners-Lee,et al.  Linked Data on the Web , 2008, LDOW.