Real-time data warehouse loading methodology

A data warehouse provides information for analytical processing, decision making and data mining tools. As the concept of real-time enterprise evolves, the synchronism between transactional data and data warehouses, statically implemented, has been redefined. Traditional data warehouse systems have static structures of their schemas and relationships between data, and therefore are not able to support any dynamics in their structure and content. Their data is only periodically updated because they are not prepared for continuous data integration. For real-time enterprises with needs in decision support purposes, real-time data warehouses seem to be very promising. In this paper we present a methodology on how to adapt data warehouse schemas and user-end OLAP queries for efficiently supporting real-time data integration. To accomplish this, we use techniques such as table structure replication and query predicate restrictions for selecting data, to enable continuously loading data in the data warehouse with minimum impact in query execution time. We demonstrate the efficiency of the method by analyzing its impact in query performance using benchmark TPC-H executing query workloads while simultaneously performing continuous data integration at various insertion time rates.

[1]  Ralph Kimball,et al.  The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data , 2004 .

[2]  Timos K. Sellis,et al.  ARKTOS: towards the modeling, design, control and execution of ETL processes , 2001, Inf. Syst..

[3]  eva Kühn The Zero-Delay Data Warehouse: Mobilizing Heterogeneous Databases , 2003, VLDB.

[4]  Michael Schrefl,et al.  Towards an accommodation of delay in temporal active databases , 2000, Proceedings 11th Australasian Database Conference. ADC 2000 (Cat. No.PR00528).

[5]  Jennifer Widom,et al.  Incremental computation and maintenance of temporal aggregates , 2003, The VLDB Journal.

[6]  Beate List,et al.  Striving towards Near Real-Time Data Integration for Data Warehouses , 2002, DaWaK.

[7]  Jennifer Widom,et al.  Temporal Data Warehousing , 2009, Encyclopedia of Database Systems.

[8]  W. H. Inmon,et al.  Data Warehousing for E-Business , 2001 .

[9]  Mokrane Bouzeghoub,et al.  Modeling the Data Warehouse Refreshment Process as a Workflow Application , 1999, DMDW.

[10]  Dimitri Theodoratos,et al.  Data Currency Quality Factors in Data Warehouse Design , 1999, DMDW.

[11]  Jennifer Widom,et al.  Temporal View Self-Maintenance , 2000, EDBT.

[12]  Ralph Kimball,et al.  The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom , 1998 .

[13]  Torben Bach Pedersen How Is BI Used in Industry?: Report from a Knowledge Exchange Network , 2004, DaWaK.

[14]  João Eduardo Ferreira,et al.  Synchronization options for data warehouse designs , 2006, Computer.

[15]  Timos K. Sellis,et al.  Optimizing ETL processes in data warehouses , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Jennifer Widom,et al.  Performance Issues in Incremental Warehouse Maintenance , 2000, VLDB.

[17]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[18]  A Min Tjoa,et al.  Capturing Delays and Valid Times in Data Warehouses—Towards Timely Consistent Analyses , 2002, Journal of Intelligent Information Systems.

[19]  Klaus Kreplin,et al.  SAP Business Information Warehouse-from data warehousing to an e-business platform , 2001, Proceedings 17th International Conference on Data Engineering.

[20]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[21]  Evaggelia Pitoura,et al.  ETL queues for active data warehousing , 2005, IQIS '05.

[22]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[23]  Jennifer Widom,et al.  Flexible time management in data stream systems , 2004, PODS.