Semantic Integration and Query of Heterogeneous Information Sources

Developing intelligent tools for the integration of information extracted from multiple heterogeneous sources is a challenging issue to e ectively exploit the numerous sources available on-line in global information systems. In this paper, we propose intelligent, tool-supported techniques to information extraction and integration from both structured and semistructured data sources. An object-oriented language, with an underlying Description Logics, called ODLI3 , derived from the standard ODMG is introduced for information extraction. ODLI3 descriptions of the source schemas are exploited rst to set a shared vocabulary for the sources. Information integration is then performed in a semi-automatic way, by exploiting ODLI3 descriptions of source schemas with a combination of Description Logics and clustering techniques and gives rise to a virtual integrated view of multiple sources. As the ultimate goal of providing an integrated view is querying the view, independently from the location/heterogeneity of the sources, a module for the reformulation of queries at the sources with semantic optimization capabilities is provided. Integration techniques described in the paper have been implemented in the MOMIS system, based on a conventional mediator architecture. This research has been partially funded by the italian MURST ex-40% INTERDATA project Metodologie e Tecnologie per la Gestione di Dati e Processi su Reti Internet e Intranet. A preliminary version of the paper appears in the proceedings of IJCAI-99 Workshop on Intelligent Information Integration 31 July 1999, Stockholm.

[1]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[2]  Maurizio Lenzerini,et al.  Representing and Using Interschema Knowledge in Cooperative Information Systems , 1993, Int. J. Cooperative Inf. Syst..

[3]  Gunter Saake,et al.  Merging inheritance hierarchies for database integration , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[4]  Silvana Castano,et al.  Semantic integration of semistructured and structured data sources , 1999, SGMD.

[5]  Richard Hull,et al.  Managing semantic heterogeneity in databases: a theoretical prospective , 1997, PODS.

[6]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[7]  Ingo Schmitt,et al.  An incremental approach to schema integration by refining extensional relationships , 1998, CIKM '98.

[8]  Aldo Franco Dragoni,et al.  Belief revision: from theory to practice , 1997, The Knowledge Engineering Review.

[9]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  Silvana Castano,et al.  Conceptual schema analysis: techniques and applications , 1998, TODS.

[11]  Domenico Beneventano,et al.  Consistency Checking in Complex Object Database Schemata with Integrity Constraints , 1995, DBPL.

[12]  Luigi Palopoli,et al.  An automatic technique for detecting type conflicts in database schemes , 1998, CIKM '98.

[13]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[14]  Silvana Castano,et al.  Semantic dictionary design for database interoperability , 1997, Proceedings 13th International Conference on Data Engineering.

[15]  James G. Schmolze,et al.  The KL-ONE family , 1992 .

[16]  Gottfried Vossen,et al.  Transforming Relational Database Schemas into Object-Oriented Schemas according to ODMG-93 , 1995, DOOD.

[17]  Craig A. Knoblock,et al.  Query processing in the SIMS information mediator , 1997 .

[18]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[19]  Diego Calvanese,et al.  Rewriting of regular expressions and regular path queries , 1999, PODS '99.

[20]  Serge Abiteboul,et al.  Extracting schema from semistructured data , 1998, SIGMOD '98.

[21]  Luigi Palopoli,et al.  Semi-automatic, semantic discovery of properties from database schemes , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[22]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[23]  Masatoshi Yoshikawa,et al.  ILOG: Declarative Creation and Manipulation of Object Identifiers , 1990, VLDB.

[24]  Diego Calvanese,et al.  What can Knowledge Representation do for Semi-Structured Data? , 1998, AAAI/IAAI.

[25]  Silvana Castano,et al.  Deriving Global Conceptual Views from Multiple Information Sources , 1997, Conceptual Modeling.

[26]  Z. Meral Özsoyoglu,et al.  Design and Implementation of a Semantic Query Optimizer , 1989, IEEE Trans. Knowl. Data Eng..

[27]  Deborah L. McGuinness,et al.  CLASSIC: a structural data model for objects , 1989, SIGMOD '89.

[28]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[29]  Silvana Castano,et al.  An intelligent approach to information integration , 1998 .

[30]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[31]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[32]  Maurizio Vincini,et al.  ODB-Tools: A Description Logics Based Tool for Schema Validation and Semantic Query Optimization in Object Oriented Databases , 1997, AI*IA.

[33]  Maurizio Vincini,et al.  ODB-QOPTIMIZER: a tool for semantic query optimization in OODB , 1997, Proceedings 13th International Conference on Data Engineering.

[34]  V. Lesser,et al.  BIG: A Resource-Bounded Information Gathering Agent , 1998, AAAI/IAAI.

[35]  Michael R. Genesereth,et al.  Infomaster: an information integration system , 1997, SIGMOD '97.

[36]  Domenico Beneventano,et al.  Semantic Query Optimization by Subsumption in OODB , 1996, FQAS.

[37]  Luigi Palopoli,et al.  Automatic Derivation of Terminological Properties from Database Schemes , 1998, DEXA.