Tracing the lineage of view data in a warehousing environment

We consider the view data lineageproblem in a warehousing environment: For a given data item in a materialized warehouse view, we want to identify the set of source data items that produced the view item. We formally define the lineage problem, develop lineage tracing algorithms for relational views with aggregation, and propose mechanisms for performing consistent lineage tracing in a multisource data warehousing environment. Our result can form the basis of a tool that allows analysts to browse warehouse data, select view tuples of interest, and then “drill-through” to examine the exact source tuples that produced the view tuples of interest.

[1]  Nicolas Spyratos,et al.  Update semantics of relational views , 1981, TODS.

[2]  Yoshifumi Masunaga,et al.  A Relational Database View Update Translation Mechanism , 1984, VLDB.

[3]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[4]  Michael Stonebraker,et al.  Supporting fine-grained data lineage in a database visualization environment , 1997, Proceedings 13th International Conference on Data Engineering.

[5]  Gang Zhou,et al.  A framework for supporting data integration using the materialized and virtual approaches , 1996, SIGMOD '96.

[6]  July , 1890, The Hospital.

[7]  Jennifer Chiang,et al.  Issues for On-Line Analytical Mining of Data Warehouses , 1998 .

[8]  Jennifer Widom,et al.  A System Prototype for Warehouse View Maintenance , 1996, VIEWS.

[9]  Jennifer Widom,et al.  Making views self-maintainable for data warehousing , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[10]  Kenneth A. Ross,et al.  Concurrency Control Theory for Deferred Materialized Views , 1997, ICDT.

[11]  Michael Stonebraker,et al.  Implementation of integrity constraints and views by query modification , 1975, SIGMOD '75.

[12]  Matthew O. Ward,et al.  Managing Derived Data in the Gaea Scientific DBMS , 1993, VLDB.

[13]  Jennifer Widom,et al.  Performance Issues in Incremental Warehouse Maintenance , 2000, VLDB.

[14]  Venky Harinarayan,et al.  Implementing Data Cubes E ciently , 1996 .

[15]  Inderpal Singh Mumick,et al.  Maintenance of data cubes and summary tables in a warehouse , 1997, SIGMOD '97.

[16]  Christos Faloutsos,et al.  Recovering Information from Summary Data , 1997, VLDB.

[17]  Jennifer Widom,et al.  Practical lineage tracing in data warehouses , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18]  WidomJennifer,et al.  Tracing the lineage of view data in a warehousing environment , 2000 .

[19]  Jennifer Widom,et al.  Storing auxiliary data for efficient maintenance and lineage tracing of complex views , 2000, DMDW.

[20]  Umeshwar Dayal,et al.  On the Updatability of Relational Views , 1978, VLDB.

[21]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[22]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[23]  Yue Zhuge,et al.  The Strobe algorithms for multi-source warehouse consistency , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[24]  Jennifer Widom,et al.  Lineage tracing in a data warehousing system , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[25]  George Colliat,et al.  OLAP, relational, and multidimensional database systems , 1996, SGMD.

[26]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[27]  Wilburt Labio,et al.  Physical database design for data warehouses , 1997, Proceedings 13th International Conference on Data Engineering.

[28]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[29]  H. V. Jagadish,et al.  Data Integration using Self-Maintainable Views , 1996, EDBT.

[30]  Yue Zhuge,et al.  Multiple view consistency for data warehousing , 1997, Proceedings 13th International Conference on Data Engineering.