Efficient Selection and Integration of Data Sources for Answering Semantic Web Queries

In this work we adapt an efficient information integration algorithm to identify the minimal set of potentially relevant Semantic Web data sources for a given query. The vast majority of these sources are files written in RDF or OWL format, and must be processed in their entirety. Our adaptation includes enhancing the algorithm with taxonomic reasoning, defining and using a mapping language for the purpose of aligning heterogeneous Semantic Web ontologies, and introducing a concept of source relevance to reduce the number of sources that we need to consider for a given query. After the source selection process, we load the selected sources into a Semantic Web reasoner to get a sound and complete answer to the query. We have conducted an experiment using synthetic ontologies and data sources which demonstrates that our system performs well over a wide range of queries. A typical response time for a substantial work load of 50 domain ontologies, 80 map ontologies and 500 data sources is less than 2 seconds. Furthermore,our system returned correct answers to 200 randomly generated queries in several workload configurations. We have also compared our adaptation with a basic implementation of the original information integration algorithm that does not do any taxonomic reasoning. In the most complex configuration with 50 domain ontologies, 100 map ontologies and 1000 data sources our system returns complete answers to all the queries whereas the basic implementation returns complete answers to only 28% of the queries.

[1]  Yun Peng,et al.  Search on the Semantic Web , 2005, Computer.

[2]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[3]  Steffen Staab,et al.  Bibster - A Semantics-Based Bibliographic Peer-to-Peer System , 2004, SEMWEB.

[4]  Jeff Heflin,et al.  Information Integration Via an End-to-End Distributed Semantic Web System , 2006, SEMWEB.

[5]  Steffen Staab,et al.  Bibster - A Semantics-Based Bibliographic Peer-to-Peer System , 2004, International Semantic Web Conference.

[6]  Boris Motik,et al.  A Comparison of Reasoning Techniques for Querying Large Description Logic ABoxes , 2006, LPAR.

[7]  Manolis Koubarakis,et al.  Continuous RDF Query Processing over DHTs , 2007, ISWC/ASWC.

[8]  Dieter Fensel,et al.  Unifying Reasoning and Search to Web Scale , 2007, IEEE Internet Computing.

[9]  Dimitre A. Dimitrov,et al.  An Efficient and Complete Distributed Query Answering System for Semantic Web Data , 2007 .

[10]  Luciano Serafini,et al.  Distributed Instance Retrieval in Heterogeneous Ontologies , 2005, SWAP.

[11]  Steffen Staab,et al.  Bibster - a semantics-based bibliographic Peer-to-Peer system , 2004, J. Web Semant..

[12]  Heiner Stuckenschmidt,et al.  Index structures and algorithms for querying distributed RDF repositories , 2004, WWW '04.

[13]  Boris Motik,et al.  Query Answering for OWL-DL with Rules , 2004, International Semantic Web Conference.

[14]  Ian Horrocks,et al.  A Conjunctive Query Language for Description Logic Aboxes , 2000, AAAI/IAAI.

[15]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[16]  Luciano Serafini,et al.  Instance Migration in Heterogeneous Ontology Environments , 2007, ISWC/ASWC.

[17]  François Goasdoué,et al.  SomeWhere in the Semantic Web , 2005, SOFSEM.

[18]  Boris Motik,et al.  A mapping system for the integration of OWL-DL ontologies , 2005, IHIS '05.

[19]  Dan Suciu,et al.  The Piazza peer data management system , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20]  Ian Horrocks,et al.  Description logic programs: combining logic programs with description logic , 2003, WWW '03.