On Matching Schemas Automatically

Abstract Schema matching is a basic problem in many database application domains, such as data integration,E-business, data warehousing, and semantic query processing. In current implementations, schemamatching is typically performed manually, which has significant limitations. On the other hand, inprevious research many techniques have been proposed to achieve a partial automation of the Matchoperation for specific application domains. We present a taxonomy that covers many of the existingapproaches, and we describe these approaches in some detail. In particular, we distinguish betweenschema- and instance-level, element- and structure-level, and language- and constraint-based match-ers. Based on our classification we review some previous match implementations thereby indicatingwhich part of the solution space they cover. We intend our taxonomy and review of past work to beuseful when comparing different approaches to schema matching, when developing a new matchalgorithm, and when implementing a schema matching component.

[1]  Chris Clifton,et al.  Semantic Integration in Heterogeneous Databases Using Neural Networks , 1994, VLDB.

[2]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[3]  Kam-Fai Wong,et al.  Approximate Graph Schema Extraction for Semi-Structured Data , 2000, EDBT.

[4]  Kaizhong Zhang,et al.  A System for Approximate Tree Matching , 1994, IEEE Trans. Knowl. Data Eng..

[5]  Pedro M. Domingos,et al.  Learning Source Descriptions for Data Integration , 2000 .

[6]  Martin L. Kersten,et al.  A Graph-Oriented Model for Articulation of Ontology Interdependencies , 1999, EDBT.

[7]  Laura M. Haas,et al.  Transforming Heterogeneous Data with Database Middleware: Beyond Integration , 1999, IEEE Data Eng. Bull..

[8]  Philip A. Bernstein,et al.  A vision for management of complex models , 2000, SGMD.

[9]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[10]  Jeffrey D. Ullman,et al.  SYSTEM/U: a database system based on the universal relation assumption , 1984, TODS.

[11]  Chris Clifton,et al.  Experience with a Combined Approach to Attribute-Matching Across Heterogeneous Databases , 1997, DS-7.

[12]  James A. Larson,et al.  A Theory of Attribute Equivalence in Databases with Application to Schema Integration , 1989, IEEE Trans. Software Eng..

[13]  Matthias Jarke,et al.  Panel: Is Generic Metadata Management Feasible? , 2000, VLDB.

[14]  Luigi Palopoli,et al.  A unified graph-based framework for deriving nominal interscheme properties, type conflicts and object cluster similarities , 1999, Proceedings Fourth IFCIS International Conference on Cooperative Information Systems. CoopIS 99 (Cat. No.PR00384).

[15]  Silvana Castano,et al.  Semantic integration of semistructured and structured data sources , 1999, SGMD.

[16]  Kaizhong Zhang,et al.  Fast Serial and Parallel Algorithms for Approximate Tree Matching with VLDC's , 1992, CPM.

[17]  Erhard Rahm,et al.  Data Warehouse Scenarios for Model Management , 2000, ER.

[18]  Paul G. Sorenson,et al.  Explaining ambiguity in a formal query language , 1990, TODS.

[19]  Luigi Palopoli,et al.  An automatic technique for detecting type conflicts in database schemes , 1998, CIKM '98.

[20]  Tova Milo,et al.  Using Schema Matching to Simplify Heterogeneous Data Translation , 1998, VLDB.

[21]  Erhard Rahm,et al.  On Metadata Interoperability in Data Warehouses , 2000 .

[22]  Chris Clifton,et al.  SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks , 2000, Data Knowl. Eng..

[23]  Luigi Palopoli,et al.  The System DIKE: Towards the Semi-Automatic Synthesis of Cooperative Information Systems and Data Warehouses , 2000, ADBIS-DASFAA Symposium.

[24]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[25]  Chris Clifton,et al.  Database Integration Using Neural Networks: Implementation and Experiences , 2000, Knowledge and Information Systems.

[26]  Renée J. Miller,et al.  Schema equivalence in heterogeneous systems: bridging theory and practice , 1994, Inf. Syst..

[27]  Domenico Ursino,et al.  Extraction and Exploitation of Intensional Knowledge from Heterogeneous Information Sources , 2002, Lecture Notes in Computer Science.

[28]  Hyon Hee Kim,et al.  Semantic Integration of Heterogeneous XML Data Sources , 2002, OOIS.

[29]  Kaizhong Zhang,et al.  Approximate tree pattern matching , 1997 .

[30]  Luigi Palopoli,et al.  Semi-automatic, semantic discovery of properties from database schemes , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[31]  Silvana Castano,et al.  A schema analysis and reconciliation tool environment for heterogeneous databases , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[32]  Silvana Castano,et al.  Information Integration: The MOMIS Project Demonstration , 2000, VLDB.

[33]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[34]  Prasenjit Mitra,et al.  Semi-automatic Integration of Knowledge Sources , 1999 .

[35]  Professor Dr. Bernhard Thalheim Entity-Relationship Modeling , 2000, Springer Berlin Heidelberg.

[36]  Rukshan Athauda,et al.  Semantic Access: Semantic Interface for Querying Databases , 2000, VLDB.

[37]  David Maier,et al.  Toward logical data independence: a relational query language without relations , 1982, SIGMOD '82.