Automating the Schema Matching Process for Heterogeneous Data Warehouses

A federated data warehouse is a logical integration of data warehouses applicable when physical integration is impossible due to privacy policy or legal restrictions. In order to enable the translation of queries in a federated approach, schemas of the federated and the local warehouses must be matched. In this paper we present a procedure that enables the matching process for schema structures specific to the multidimensional model of data warehouses: facts, measures, dimensions, aggregation levels and dimensional attributes. Similarities between warehouse-specific structures are computed by using linguistic and structural comparison, where calculated values are used to create necessary mappings. We present restriction rules and recommendations for aggregation level matching, which builds the most complex part of the process. A software implementation of the entire process is provided in order to perform its verification, as well as to determine the proper selection metric for mapping different multidimensional structures.

[1]  A Min Tjoa,et al.  Integrating Different Grain Levels in a Medical Data Warehouse Federation , 2006, DaWaK.

[2]  Luca Cabibbo,et al.  Integrating Heterogeneous Multidimensional Databases , 2005, SSDBM.

[3]  Michael Schrefl,et al.  Analysing Multi-dimensional Data Across Autonomous Data Warehouses , 2006, DaWaK.

[4]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[5]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[6]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[7]  Silvana Castano,et al.  Semantic integration of semistructured and structured data sources , 1999, SGMD.

[8]  A Min Tjoa,et al.  The security issue of federated data warehouses in the area of evidence-based medicine , 2006, First International Conference on Availability, Reliability and Security (ARES'06).

[9]  Jungyun Seo,et al.  Classifying schematic and data heterogeneity in multidatabase systems , 1991, Computer.

[10]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[11]  AnHai Doan,et al.  iMAP: Discovering Complex Mappings between Database Schemas. , 2004, SIGMOD 2004.

[12]  David M. W. Powers,et al.  Measuring Semantic Similarity in the Taxonomy of WordNet , 2005, ACSC.

[13]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Erhard Rahm,et al.  Similarity Flooding: A Versatile Graph Matching Algorithm (Extended Technical Report) , 2001 .