Two approaches to the integration of heterogeneous data warehouses

Abstract In this paper we address the problem of integrating independent and possibly heterogeneous data warehouses, a problem that has received little attention so far, but that arises very often in practice. We start by tackling the basic issue of matching heterogeneous dimensions and provide a number of general properties that a dimension matching should fulfill. We then propose two different approaches to the problem of integration that try to enforce matchings satisfying these properties. The first approach refers to a scenario of loosely coupled integration, in which we just need to identify the common information between data sources and perform join operations over the original sources. The goal of the second approach is the derivation of a materialized view built by merging the sources, and refers to a scenario of tightly coupled integration in which queries are performed against the view. We also illustrate architecture and functionality of a practical system that we have developed to demonstrate the effectiveness of our integration strategies.

[1]  Torben Bach Pedersen,et al.  Evaluating XML-extended OLAP queries based on a physical algebra , 2004, DOLAP '04.

[2]  Takayuki Tomaru,et al.  The CLIO project , 2006 .

[3]  Francesco M. Malvestuto The derivation problem of summary data , 1988, SIGMOD '88.

[4]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[5]  Peter Honeyman,et al.  Testing satisfaction of functional dependencies , 1982, JACM.

[6]  Stefano Paraboschi,et al.  Database Systems: Concepts, Languages & Architectures , 1999 .

[7]  Torben Bach Pedersen,et al.  XML-extended OLAP querying , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[8]  Riccardo Torlone,et al.  Design and Development of a Tool for Integrating Heterogeneous Data Warehouses , 2005, DaWaK.

[9]  H. Sato Handling summary information in a database: derivability , 1981, SIGMOD '81.

[10]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[11]  Luca Cabibbo,et al.  A Logical Approach to Multidimensional Databases , 1998, EDBT.

[12]  Torben Bach Pedersen,et al.  Specifying OLAP Cubes on XML Data , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[13]  Alfred V. Aho,et al.  Efficient optimization of a class of relational expressions , 1978, SIGMOD Conference.

[14]  George A. Miller,et al.  WordNet: A Lexical Database for the English Language , 2002 .

[15]  José Samos,et al.  On relationships offering new drill-across possibilities , 2002, DOLAP '02.

[16]  Luca Cabibbo,et al.  Integrating Heterogeneous Multidimensional Databases , 2005, SSDBM.

[17]  Luca Cabibbo,et al.  From a procedural to a visual query language for OLAP , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[18]  Francesco M. Malvestuto,et al.  The Classification Problem with Semantically Heterogeneous Data , 1988, SSDBM.

[19]  Luca Cabibbo,et al.  DaWaII: a Tool for the Integration of Autonomous Data Marts , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Venkataraman Ramesh,et al.  Management of Heterogeneous and Autonomous Database Systems , 1999 .

[21]  Avigdor Gal,et al.  A framework for modeling and evaluating automatic semantic reconciliation , 2005, The VLDB Journal.

[22]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[23]  Arie Shoshani,et al.  Extending OLAP querying to external object databases , 2000, CIKM '00.

[24]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[25]  Stefano Paraboschi,et al.  Database Systems - Concepts, Languages and Architectures , 1999 .

[26]  Richard Hull,et al.  Managing semantic heterogeneity in databases: a theoretical prospective , 1997, PODS.

[27]  David Maier,et al.  Testing implications of data dependencies , 1979, SIGMOD '79.

[28]  Luca Cabibbo,et al.  On the integration of autonomous data marts , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[29]  Riccardo Torlone Conceptual Multidimensional Models , 2003, Multidimensional Databases.

[30]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[31]  Laura M. Haas,et al.  The Clio project: managing heterogeneity , 2001, SGMD.

[32]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.