Incremental maintenance of consistent data warehouses
暂无分享,去创建一个
A data warehouse stores information integrated from distributed and possibly heterogeneous information sources. In effect, the warehouse stores materialized views over the source data. This dissertation studies the maintenance of warehouse views as the data sources are updated.
The first part of this dissertation presents a family of algorithms that incrementally and consistently maintain relational materialized views in a data warehouse. This view maintenance problem differs from the traditional one in that the view definition and the base data are decoupled, and data sources are autonomous. We show that this decoupling can result in anomalies if traditional algorithms are applied. We formalize notions of consistency for warehouse views and present new algorithms that maintain consistency as the warehouse is updated. In addition, we develop simple, scalable, algorithms for ensuring mutual consistency among multiple views at a warehouse. We also present the implementation of the algorithms in the WHIPS (WareHousing Information Project at Stanford) prototype and related performance results.
The second part of this dissertation studies how to maintain graph-structured materialized views. A graph-structured database consists of records containing identifiers of other records. The data could represent semi-structured information such as Web pages, documents, XML data, or data integrated from heterogeneous data sources. We define views and materialized views for such graph-structured data, analyzing options for representing record identity and references in the view. We then develop incremental maintenance algorithms for these views, discuss how to realize these algorithms in a data warehouse, and study how to maintain the warehouse views without accessing base data.