A Transactional Model for Data Warehouse Maintenance

A Data Warehouse Management System (DWMS) incrementally maintains materialized views by issuing maintenance queries to the data sources. To address erroneous query results caused by concurrent source updates, state-of-the-art maintenance strategies typically apply compensations to resolve the conflicts. For this, they assume however that the source schema are not updated and remain stable over time. However, if schema changes occur in any of the sources, then an anomaly may arise, namely, the maintenance or the compensation queries may be broken. We now tackle this open problem by modeling the complete maintenance process as a special transaction, called a DWMS_Transaction. The anomaly problem can now be rephrased as the serializability of DWMS_Transactions. This allows us to apply well-established transaction theory to address this new anomaly problem. To achieve such serializability, we propose a multiversion concurrency control technique appropriate for loosely-coupled environments with autonomous sources. TxnWrap is complementary to maintenance algorithms from the literature by removing concurrency issues from their consideration. The experimental results confirm that TxnWrap achieves predictable steady performance even under a varying rate of concurrency.

[1]  Elke A. Rundensteiner,et al.  Integrating the maintenance and synchronization of data warehouses using a cooperative framework , 2002, Inf. Syst..

[2]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[3]  Ambuj K. Singh,et al.  Efficient view maintenance at data warehouses , 1997, SIGMOD '97.

[4]  Bruce G. Lindsay,et al.  How to roll a join: asynchronous incremental view maintenance , 2000, SIGMOD '00.

[5]  Hamid Pirahesh,et al.  Efficient and flexible methods for transient versioning of records to avoid locking by read-only transactions , 1992, SIGMOD '92.

[6]  Elke A. Rundensteiner,et al.  DyDa: Dynamic Data Warehouse Maintenance in a Fully Concurrent Environment , 2000, DaWaK.

[7]  Yue Zhuge,et al.  The Strobe algorithms for multi-source warehouse consistency , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[8]  Elke A. Rundensteiner,et al.  The EVE Approach: View Synchronization in Dynamic Distributed Environments , 2002, IEEE Trans. Knowl. Data Eng..

[9]  Elke A. Rundensteiner,et al.  DyDa: data warehouse maintenance in fully concurrent environments , 2001, SIGMOD '01.

[10]  Elke A. Rundensteiner,et al.  A Transactional Approach to Parallel Data Warehouse Maintenance , 2002, DaWaK.

[11]  Divyakant Agrawal,et al.  Modular synchronization in multiversion databases: version control and concurrency control , 1989, SIGMOD '89.

[12]  Elke A. Rundensteiner,et al.  The CVS Algorithm for View Synchronization in Evolvable Large-Scale Information Systems , 1998, EDBT.

[13]  Sunny Marche,et al.  Measuring the stability of data models , 1993 .

[14]  Jennifer Widom,et al.  On-line warehouse view maintenance , 1997, SIGMOD '97.

[15]  Kenneth A. Ross,et al.  Adapting materialized views after redefinitions , 1995, SIGMOD '95.

[16]  Surajit Chaudhuri,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications. , 1995 .

[17]  Elke A. Rundensteiner,et al.  View maintenance after view synchronization , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[18]  Jennifer Widom,et al.  Making views self-maintainable for data warehousing , 1996, Fourth International Conference on Parallel and Distributed Information Systems.