LODatio: using a schema-level index to support users infinding relevant sources of linked data

The Linked Open Data (LOD) cloud provides a vast amount of heterogeneous data, distributed over numerous data sources. This makes it difficult to find those data sources in the cloud which are relevant for a given information need. Existing search engines for the Semantic Web focus on instance-oriented information needs, i. e., searching for specific RDF instances or literals and exploring the search results. However, they do not address the question of finding linked data sources relevant to a schema-oriented information need, i. e., queries based on triple patterns relating to a specific combination of RDF types and/or properties. In this paper, we present the semantic search system LODatio leveraging a schema-level index for finding sources of Linked Data relevant to a schema-oriented information need. Beyond its capability to retrieve relevant data sources, LODatio actively supports the user in his schema-oriented search tasks. To this end, it provides ranked result lists of relevant data sources together with example snippets and an estimation of the result set size. Furthermore, LODatio provides support for novel features in semantic search such as recommending alternative queries in order to refine or broaden the result set.

[1]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[2]  Steffen Staab,et al.  Incompleteness-aware programming with RDF data , 2013, DDFP '13.

[3]  Ansgar Scherp,et al.  LOVER: support for modeling data using linked open vocabularies , 2013, EDBT '13.

[4]  Thomas Gottron,et al.  A Detailed Analysis of the Quality of Stream-Based Schema Construction on Linked Open Data , 2012, CSWS.

[5]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[6]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[7]  Steffen Staab,et al.  SchemEX - Efficient construction of a data catalogue by stream-based indexing of linked data , 2012, J. Web Semant..

[8]  Enrico Motta,et al.  SemSearch: A Search Engine for the Semantic Web , 2006, EKAW.

[9]  S. Robertson The probability ranking principle in IR , 1997 .

[10]  Stefan Decker,et al.  Sig.ma: live views on the web of data , 2010, WWW '10.

[11]  Marcia J. Bates,et al.  Where should the person stop and the information search interface start? , 1990, Inf. Process. Manag..

[12]  Stefan Decker,et al.  Sig.ma: Live views on the Web of Data , 2010, J. Web Semant..

[13]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[14]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[15]  William S. Cooper,et al.  A definition of relevance for information retrieval , 1971, Inf. Storage Retr..