Data Modifications and Versioning in Trio

The field of uncertain databases has recently attracted considerable interest. Many motivating applications for uncertainty rely fundamentally on improving the quality of data over time, through modifications, as additional information becomes available, e.g., after analysis (as in scientific data management systems), with user feedback (as in {\em pay-as-you-go} data integration). Incorporating data modifications, while still serving applications' needs to access and query older data (as in hypothetical databases), necessitates light\-weight versioning for such applications. This paper presents the first DBMS for uncertain data that supports data modifications and versioning. Our work is in the context of {\em Trio}, a project at Stanford for managing data {\em uncertainty} and {\em lineage}. We introduce SQL-based language constructs for data modifications and lightweight versioning in Trio's query language. We present an extended Trio data model, {\em ULDB$^v$}, and show how primitive modifications are applied to it, yielding versioned relations with uncertainty and lineage. We show that Trio's lineage feature enables an elegant approach to query answering in ULDB$^v$. We give efficient algorithms for propagating data-modifications to materialized views. We have incorporated the data modification and versioning capabilities in the Trio system, and validate our techniques through experiments.

[1]  Nick Roussopoulos,et al.  ADMS: A Testbed for Incremental Access Methods , 1993, IEEE Trans. Knowl. Data Eng..

[2]  Parag Agrawal,et al.  Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS (Demo) , 2007, CIDR.

[3]  Mohamed F. Mokbel,et al.  Immortal DB: transaction time support for SQL server , 2005, SIGMOD '05.

[4]  Stephen J. Hegner Specification and implementation of programs for updating incomplete information databases , 1987, PODS '87.

[5]  Jennifer Widom,et al.  Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[7]  Nick Roussopoulos,et al.  View indexing in relational databases , 1982, TODS.

[8]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[9]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[10]  Jennifer Widom,et al.  Performance Issues in Incremental Warehouse Maintenance , 2000, VLDB.

[11]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[12]  Arthur M. Keller,et al.  Approaches for updating databases with incomplete information and nulls , 1984, 1984 IEEE First International Conference on Data Engineering.

[13]  Richard T. Snodgrass,et al.  Developing Time-Oriented Database Applications in SQL , 1999 .

[14]  Marianne Winslett,et al.  A model-based approach to updating databases with incomplete information , 1988, TODS.

[15]  Norbert Fuhr,et al.  A Probabilistic NF2 Relational Algebra for Imprecision in Databases , 1997 .

[16]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[17]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[20]  Sunil Prabhakar,et al.  U-DBMS: A Database System for Managing Constantly-Evolving Data , 2005, VLDB.

[21]  Frank Wm. Tompa,et al.  Efficiently updating materialized views , 1986, SIGMOD '86.

[22]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[24]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[25]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Jennifer Widom,et al.  Deriving Production Rules for Incremental View Maintenance , 1991, VLDB.

[27]  Dan Olteanu,et al.  MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[28]  Serge Abiteboul,et al.  On the representation and querying of sets of possible worlds , 1987, SIGMOD '87.

[29]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[30]  Inderpal Singh Mumick,et al.  The Stanford Data Warehousing Project , 1995 .

[31]  C. J. Date,et al.  Temporal data and the relational model , 2002 .

[32]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[33]  Umeshwar Dayal,et al.  On the Updatability of Relational Views , 1978, VLDB.

[34]  Serge Abiteboul,et al.  Update Semantics for Incomplete Databases , 1985, VLDB.