From complete to incomplete information and back

Incomplete information arises naturally in numerous data management applications. Recently, several researchers have studied query processing in the context of incomplete information. Most work has combined the syntax of a traditional query language like relational algebra with a nonstandard semantics such as certain or ranked possible answers. There are now also languages with special features to deal with uncertainty. However, to the standards of the data management community, to date no language proposal has been made that can be considered a natural analog to SQL or relational algebra for the case of incomplete information. In this paper we propose such a language, World-set Algebra, which satisfies the robustness criteria and analogies to relational algebra that we expect. The language supports the contemplation on alternatives and can thus map from a complete database to an incomplete one comprising several possible worlds. We show that World-set Algebra is conservative over relational algebra in the sense that any query that maps from a complete database to a complete database (a complete-to-complete query) is equivalent to a relational algebra query. Moreover, we give an efficient algorithm for effecting this translation. We then study algebraic query optimization of such queries. We argue that query languages with explicit constructs for handling uncertainty allow for the more natural and simple expression of many real-world decision support queries. The results of this paper not only suggest a language for specifying queries in this way, but also allow for their efficient evaluation in any relational database management system.

[1]  Jan Chomicki,et al.  Answer sets for consistent query answering in inconsistent databases , 2002, Theory and Practice of Logic Programming.

[2]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[3]  Renée J. Miller,et al.  Clean Answers over Dirty Databases: A Probabilistic Approach , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[5]  Christoph Koch,et al.  World-set decompositions: Expressiveness and efficient algorithms , 2007, Theor. Comput. Sci..

[6]  Christoph Mangold,et al.  Laws for Rewriting Queries Containing Division Operators , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  Dan Olteanu,et al.  $${10^{(10^{6})}}$$ worlds and beyond: efficient representation and processing of incomplete information , 2006, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Gösta Grahne,et al.  The Problem of Incomplete Information in Relational Databases , 1991, Lecture Notes in Computer Science.

[9]  Gösta Grahne,et al.  Dependency Satisfaction in Databases with Incomplete Information , 1984, VLDB.

[10]  Serge Abiteboul,et al.  On the representation and querying of sets of possible worlds , 1987, SIGMOD '87.

[11]  Dirk Van Gucht,et al.  Converting nested algebra expressions into flat algebra expressions , 1992, TODS.

[12]  Limsoon Wong,et al.  Semantic representations and query languages for or-sets , 1993, PODS '93.

[13]  Richard Hull,et al.  A framework for implementing hypothetical queries , 1997, SIGMOD '97.

[14]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[15]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[16]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[17]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[18]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[19]  Tomasz Imielinski,et al.  Incomplete object—a data model for design and planning applications , 1991, SIGMOD '91.

[20]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[21]  Meikel Pöss,et al.  New TPC benchmarks for decision support and web commerce , 2000, SGMD.

[22]  Wolfgang Faber,et al.  The INFOMIX system for advanced integration of incomplete and inconsistent data , 2005, SIGMOD '05.

[23]  Dan Olteanu,et al.  Efficient Representation and Processing of Incomplete Information , 2006 .

[24]  Jennifer Widom,et al.  An Introduction to ULDBs and the Trio System , 2006, IEEE Data Eng. Bull..