A Compensation-based Approach for Materialized View Maintenance in Distributed Environments

Data integration over multiple heterogeneous data sources has become increasingly important for modern applications. The integrated data is usually stored in materialized views (MV) to allow better access, performance and high availability. MV must be maintained after the data sources change. In a loosely-coupled environment, such as the Data Grid, the data sources are autonomous. Hence the source updates can be concurrent and cause erroneous maintenance results. State-of-the-art maintenance strategies apply compensating queries to correct such errors, making the restricting assumption that all source schemata remain static over time. However, in such dynamic environments, the data sources may change not only their data but also their schema, query capabilities or semantics. Consequently, either the maintenance queries or compensating queries would fail. In this paper, we propose a framework called DyDa that overcomes these limitations and handles both source data updates and schema changes. First, we identify three types of maintenance anomalies, caused by either data updates and/or rename and/or drop schema changes. We propose a compensation algorithm to solve the first two types of anomalies. We identify that the third type of anomaly is caused by the violation of dependencies between the maintenance processes. We propose a detection and correction algorithm to remove such anomalies based on the formalisms of dependencies. A new view adaptation algorithm is designed to incrementally adapt some complex updates introduced by the correction algorithm. Put together, these algorithms are the first complete solution to the concurrency problems for MV maintenance in loosely-coupled environments. We have implemented the DyDa system. The experimental results show that our new concurrency handling strategy imposes a minimal overhead on normal data update processing while allowing for the extended functionality to maintain the materialized views even under concurrent schema changes.

[1]  Elke A. Rundensteiner,et al.  PVM: Parallel View Maintenance under Concurrent Data Updates of Distributed Sources , 2001, DaWaK.

[2]  Elke A. Rundensteiner,et al.  The CVS Algorithm for View Synchronization in Evolvable Large-Scale Information Systems , 1998, EDBT.

[3]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[4]  Ambuj K. Singh,et al.  Efficient view maintenance at data warehouses , 1997, SIGMOD '97.

[5]  Bruce G. Lindsay,et al.  How to roll a join: asynchronous incremental view maintenance , 2000, SIGMOD '00.

[6]  Latha S. Colby,et al.  Algorithms for deferred view maintenance , 1996, SIGMOD '96.

[7]  Yue Zhuge,et al.  The Strobe algorithms for multi-source warehouse consistency , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[8]  Elke A. Rundensteiner,et al.  Integrating the maintenance and synchronization of data warehouses using a cooperative framework , 2002, Inf. Syst..

[9]  Elke A. Rundensteiner,et al.  The EVE Approach: View Synchronization in Dynamic Distributed Environments , 2002, IEEE Trans. Knowl. Data Eng..

[10]  Elke A. Rundensteiner,et al.  DyDa: data warehouse maintenance in fully concurrent environments , 2001, SIGMOD '01.

[11]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[12]  Elke A. Rundensteiner,et al.  Detection and correction of conflicting source updates for view maintenance , 2004, Proceedings. 20th International Conference on Data Engineering.

[13]  Renée J. Miller,et al.  Mapping Adaptation under Evolving Schemas , 2003, VLDB.

[14]  Elke A. Rundensteiner,et al.  View maintenance after view synchronization , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[15]  Elke A. Rundensteiner,et al.  A Transactional Model for Data Warehouse Maintenance , 2002, ER.

[16]  Inderpal Singh Mumick,et al.  Incremental Maintenance Of Views With Duplicates , 1999 .

[17]  Inderpal Singh Mumick,et al.  Efficient Maintenance Of Materialized Mediated Views , 1999 .

[18]  Kenneth A. Ross,et al.  Adapting materialized views after redefinitions: techniques and a performance study , 2001, Inf. Syst..

[19]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[20]  Arun Jagatheesan,et al.  Data grid management systems , 2003, SIGMOD '03.

[21]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[22]  Alon Y. Halevy,et al.  Piazza: data management infrastructure for semantic web applications , 2003, WWW '03.