Semantic Query Transformation for Integrating Web Information Sources

The heterogeneousness and dynamics of web information sources are the major challenges to Internet-scale information integration. The information sources are different in contents and query interfaces. In addition, the sources can be highly dynamic in the sense that they can be added, removed, or updated with time. This paper introduces a novel information integration framework that leverages the industry standards on web services (WSDL/SOAP) and ontology description language (RDF/OWL), and a commercial database (IBM DB2 Information Integrator⎯DB2 II (DB2 II)). Taking advantage of the data integration and query optimization capability of DB2 II, this paper focuses on the methodologies to transform a user query to the queries on different sources and to combine the transformation results into a query to DB2 II. By wrapping information sources using web services and annotating them with regard to their contents, query capabilities and the logical relations between concepts, our query transformation engine is rooted in ontology-based reasoning. To the best of our knowledge, this is the first framework that uses web services as the interface of information sources and combines ontology-based reasoning, web services, semantic annotation on web services, as well as DB2 II to support Internet-scale information integration.