Dealing with Semantic Heterogeneity During Data Integration

Multi-sources information systems, such as data warehouse systems, involve heterogeneous sources. In this paper, we deal with the semantic heterogeneity of the data instances. Problems may occur when confronting sources, each time different level of denominations have been used for the same value, e.g. "vermilion" in one source, and "red" in an other. We propose to manage this semantic heterogeneity by using a linguistic dictionary. "Semantic operators" allow a linguistic flexibility in the queries, e.g. two tuples with the values "red" and "vermilion" could match in a semantic join on the "color" attribute. A particularity of our approach is it states the scope of the flexibility by defining classes of equivalent values by the mean of "priority nodes". They are used as parameters for allowing the user to define the scope of the flexibility in a very natural manner, without specifying any distance.

[1]  Michael R. Genesereth,et al.  The Basis for Mediation , 1995, International Conference on Cooperative Information Systems.

[2]  Erich J. Neuhold,et al.  Semantic vs. structural resemblance of classes , 1991, SGMD.

[3]  Isabelle Mirbel,et al.  Semantic Integration of Conceptual Schemas , 1997, Data Knowl. Eng..

[4]  Janis A. Bubenko,et al.  Semantic Similarity Relations and Computation in Schema Integration , 1996, Data Knowl. Eng..

[5]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[6]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7]  Jennifer Widom,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications , 1999, IEEE Data Eng. Bull..

[8]  LINDA G. DEMICHIEL,et al.  Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains , 1989, IEEE Trans. Knowl. Data Eng..

[9]  C. Fellbaum An Electronic Lexical Database , 1998 .

[10]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[11]  Timos K. Sellis,et al.  Data Warehouse Schema and Instance Design , 1998, ER.

[12]  Diego Calvanese,et al.  Information integration: conceptual modeling and reasoning support , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[13]  Elisabeth Métais,et al.  Database Schema Design: A Perspective From Natural Language Techniques to Validation and View Integration , 1993, ER.

[14]  Jeffrey D. Ullman,et al.  Capability based mediation in TSIMMIS , 1998, SIGMOD '98.

[15]  Gio Wiederhold,et al.  Flexible relation: an approach for integrating data from multiple, possibly inconsistent databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[16]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[17]  Matthias Jarke,et al.  Fundamentals of Data Warehouses , 2000, Springer Berlin Heidelberg.

[18]  Zoubida Kedad,et al.  Using Linguistic Knowledge in View Integration: Toward a Third Generation of Tools , 1997, Data Knowl. Eng..

[19]  Paul Johannesson A Logic Based Approach to Schema Integration , 1991, ER.

[20]  Ramanathan V. Guha,et al.  Cyc: toward programs with common sense , 1990, CACM.

[21]  Jennifer Widom,et al.  The WHIPS prototype for data warehouse creation and maintenance , 1997, SIGMOD '97.

[22]  Sophie Cluet,et al.  Your mediators need data conversion! , 1998, SIGMOD '98.

[23]  Paul Johannesson,et al.  Using Conceptual Graph Theory to Support Schema Integration , 1993, ER.