Integrating heterogeneous data warehouses using XML technologies

Data warehousing has been widely adopted by contemporary enterprises. For inter-organizational information sharing, the need cannot be over-emphasized to conduct researches on the integration of heterogeneous data warehouses to overcome the challenging situations today. That makes it urgent to establish a systematic integration methodology for integrating heterogeneous data warehouses via the Internet or proprietary extranets. Traditionally, researchers usually employed a canonical format as the integration medium for logical data integrations among heterogeneous systems. In this paper, to fully utilize the power of the Internet, we propose a framework and develop a prototype to integrate heterogeneous data warehouses by XML technologies. We first formally define the elements in data warehousing and discuss various semantic conflicts occurring among heterogeneous data cubes. Then, we propose the system architecture and related resolution procedures for all kinds of semantic conflicts. For local data cubes with different schemas, we define a global XML Schema to integrate the local cube structures, and transform each local cube respectively into an XML document conforming to the global XML Schema. These transformed XML documents obtained from local cubes will be manipulated by pre-defined XQuery commands to form a unified XML document, which can be regarded as the global cube. The integrated global cube can be easily stored and manipulated in native XML databases. The proposed methodology enables global users to browse or pose multi-dimensional expressions (MDX) on the global cube to obtain a result in the same way as they perform locally.

[1]  Torben Bach Pedersen,et al.  XML-extended OLAP querying , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[2]  Hongjun Lu,et al.  An aspect of query optimization in multidatabase systems , 1995, SGMD.

[3]  Jaroslav Pokorný Modelling stars using XML , 2001, DOLAP '01.

[4]  Yuri Breitbart,et al.  Database integration in a distributed heterogeneous database system , 1986, 1986 IEEE Second International Conference on Data Engineering.

[5]  Umeshwar Dayal,et al.  View Definition and Generalization for Database Integration in a Multidatabase System , 1984, IEEE Transactions on Software Engineering.

[6]  Oscar Mangisengi,et al.  A Framework for Supporting Interoperability of Data Warehouse Islands Using XML , 2001, DaWaK.

[7]  Torben Bach Pedersen,et al.  Specifying OLAP Cubes on XML Data , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[8]  Arbee L. P. Chen,et al.  Refining Imprecise Data by Integrity Constraints , 1993, Data Knowl. Eng..

[9]  Arbee L. P. Chen,et al.  Implementing the Division Operation on a Database containing Uncertain Data , 1996, J. Inf. Sci. Eng..

[10]  Vassilis Christophides,et al.  On wrapping query languages and efficient XML integration , 2000, SIGMOD 2000.

[11]  Peter Thanisch,et al.  Constructing an OLAP cube from distributed XML data , 2002, DOLAP '02.

[12]  Arbee L. P. Chen,et al.  Answering heterogeneous database queries with degrees of uncertainty , 2005, Distributed and Parallel Databases.

[13]  David K. Hsiao Federated databases and systems: Part I—A tutorial on their data sharing , 2005, The VLDB Journal.

[14]  Vassilis Christophides,et al.  On wrapping query languages and efficient XML integration , 2000, SIGMOD '00.

[15]  Philip Wadler,et al.  XQuery from the Experts: A Guide to the W3C XML Query Language , 2003 .

[16]  David W. Embley,et al.  An approach to schema integration and query formulation in federated database systems , 1987, 1987 IEEE Third International Conference on Data Engineering.

[17]  S. Misbah Deen,et al.  Data Integration in Distributed Databases , 1987, IEEE Transactions on Software Engineering.

[18]  Wei-Pang Yang,et al.  Integration of Relations with Conflicting Schema Structures in Heterogeneous Database Systems , 1998, Data Knowl. Eng..

[19]  Ralph Kimball,et al.  The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses , 1996 .

[20]  Boris Vrdoljak,et al.  Data warehouse design from XML sources , 2001, DOLAP '01.

[21]  W. Litwin,et al.  An overview of the multi-database manipulation language MDSL , 1987, Proceedings of the IEEE.

[22]  Donald D. Chamberlin XQuery: An XML query language , 2002, IBM Syst. J..

[23]  George Spofford,et al.  MDX Solutions With Microsoft SQL Server Analysis Services 2005 And Hyperion Essbase , 2001 .

[24]  Kyong-Ha Lee,et al.  Conflict classification and resolution in heterogeneous information integration based on XML Schema , 2002, 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM '02. Proceedings..

[25]  Andrew V. Royappa Implementing catalog clearinghouses with XML and XSL , 1999, SAC '99.

[26]  Frank S. C. Tseng,et al.  The Concept of Document Warehousing and Its Applications on Managing Enterprise Business Intelligence , 2004, PACIS.

[27]  Frank S. C. Tseng,et al.  D-Tree: A Multi-Dimensional Indexing Structure for Constructing Document Warehouses , 2006, J. Inf. Sci. Eng..

[28]  A. Zeroual,et al.  MSQL: A Multidatabase Language , 1989, Inf. Sci..

[29]  David K. Hsiao Federated databases and systems: Part II — A tutorial on their resource consolidation , 2005, The VLDB Journal.

[30]  Christine Vanoirbeek,et al.  XML documents production for an electronic platform of requests for proposals , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[31]  John Grant,et al.  Partial Values in a Tabular Database Model , 1979, Inf. Process. Lett..

[32]  Dimitrios Gunopulos,et al.  Architecture and Implementation of an XQuery-based Information Integration Platform. , 2002 .

[33]  Frank Shou-Cheng Tseng Design of a multi-dimensional query expression for document warehouses , 2005, Inf. Sci..

[34]  Georges Gardarin,et al.  Integrating heterogeneous data sources with XML and XQuery , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[35]  Frank Shou-Cheng Tseng,et al.  An automatic load/extract scheme for XML documents through object-relational repositories , 2002, J. Syst. Softw..

[36]  Arnon Rosenthal,et al.  Using semantic values to facilitate interoperability among heterogeneous information systems , 1994, TODS.

[37]  Arbee L. P. Chen,et al.  Evaluating Aggregate Operations Over Imprecise Data , 1996, IEEE Trans. Knowl. Data Eng..

[38]  LINDA G. DEMICHIEL,et al.  Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains , 1989, IEEE Trans. Knowl. Data Eng..

[39]  Arbee L. P. Chen,et al.  Searching a minimal semantically-equivalent subset of a set of partial values , 1993, The VLDB Journal.

[40]  Amar Gupta,et al.  A Methodology for Integration of Heterogeneous Databases , 1994, IEEE Trans. Knowl. Data Eng..

[41]  A Min Tjoa,et al.  A framework for a multidimensional OLAP model using Topic Maps , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[42]  Dennis McLeod,et al.  A federated architecture for information management , 1985, TOIS.

[43]  Wolfgang Hümmer,et al.  XCube: XML for data warehouses , 2003, DOLAP '03.

[44]  Dennis Murray,et al.  Data warehousing in the real world - a practical guide for building decision support systems , 1997 .

[45]  Ravi Krishnamurthy,et al.  Language features for interoperability of databases with schematic discrepancies , 1991, SIGMOD '91.

[46]  Ralph Kimball,et al.  The Data Webhouse Toolkit: Building the Web-enabled Data Warehouse , 2000, Ind. Manag. Data Syst..