Representing uncertain data: models, properties, and algorithms

In general terms, an uncertain relation encodes a set of possible certain relations. There are many ways to represent uncertainty, ranging from alternative values for attributes to rich constraint languages. Among the possible models for uncertain data, there is a tension between simple and intuitive models, which tend to be incomplete, and complete models, which tend to be nonintuitive and more complex than necessary for many applications. We present a space of models for representing uncertain data based on a variety of uncertainty constructs and tuple-existence constraints. We explore a number of properties and results for these models. We study completeness of the models, as well as closure under relational operations, and we give results relating closure and completeness. We then examine whether different models guarantee unique representations of uncertain data, and for those models that do not, we provide complexity results and algorithms for testing equivalence of representations. The next problem we consider is that of minimizing the size of representation of models, showing that minimizing the number of tuples also minimizes the size of constraints. We show that minimization is intractable in general and study the more restricted problem of maintaining minimality incrementally when performing operations. Finally, we present several results on the problem of approximating uncertain data in an insufficiently expressive model.

[1]  Amihai Motro,et al.  Management of uncertainty in database systems , 1995 .

[2]  Vangelis Th. Paschos,et al.  Polynomial Approximation and Graph-Coloring , 2003, Computing.

[3]  Gösta Grahne Horn tables-an efficient tool for handling incomplete information in databases , 1989, PODS '89.

[4]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[5]  Peter J. Haas,et al.  MCDB: a monte carlo approach to managing uncertain data , 2008, SIGMOD Conference.

[6]  Norbert Fuhr,et al.  A Probabilistic Framework for Vague Queries and Imprecise Information in Databases , 1990, VLDB.

[7]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[8]  Pedro M. Domingos,et al.  Dynamic Probabilistic Relational Models , 2003, IJCAI.

[9]  Daisy Zhe Wang,et al.  BayesStore: managing large, uncertain data repositories with probabilistic graphical models , 2008, Proc. VLDB Endow..

[10]  Gerhard Weikum,et al.  The XXL search engine: ranked retrieval of XML data using indexes and ontologies , 2002, SIGMOD '02.

[11]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[12]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[13]  Jan Chomicki,et al.  Answer sets for consistent query answering in inconsistent databases , 2002, Theory and Practice of Logic Programming.

[14]  Won Kim,et al.  Modern Database Systems: The Object Model, Interoperability, and Beyond , 1995, Modern Database Systems.

[15]  Sunil Prabhakar,et al.  U-DBMS: A Database System for Managing Constantly-Evolving Data , 2005, VLDB.

[16]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[17]  Amihai Motro,et al.  Imprecision and Uncertainty in Database Systems , 1995 .

[18]  Jan Chomicki,et al.  Minimal-change integrity maintenance using tuple deletions , 2002, Inf. Comput..

[19]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[20]  Rajshekhar Sunderraman,et al.  Indefinite and maybe information in relational databases , 1990, TODS.

[21]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[22]  Moshe Y. Vardi Querying logical databases , 1985, J. Comput. Syst. Sci..

[23]  Dan Suciu,et al.  Answering Queries from Statistics and Probabilistic Views , 2005, VLDB.

[24]  Michael Böttner,et al.  Variable-Free Semantics , 2000 .

[25]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Christopher Ré,et al.  Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization , 2007, VLDB.

[27]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[28]  Shubha U. Nabar,et al.  Representing Uncertain Data: Uniqueness, Equivalence, Minimization, and Approximation , 2005 .

[29]  Limsoon Wong,et al.  Semantic representations and query languages for or-sets , 1993, PODS '93.

[30]  Andrea Calì,et al.  On the decidability and complexity of query answering over inconsistent and incomplete databases , 2003, PODS.

[31]  B. Buckles,et al.  A fuzzy representation of data for relational databases , 1982 .

[32]  Tomasz Imielinski,et al.  Incomplete object—a data model for design and planning applications , 1991, SIGMOD '91.

[33]  William C. Purdy,et al.  A Logic for Natural Language , 1991, Notre Dame J. Formal Log..

[34]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[35]  Phan Minh Dung,et al.  Integrating data from possibly inconsistent databases , 1996, Proceedings First IFCIS International Conference on Cooperative Information Systems.

[36]  François Bry,et al.  Query Answering in Information Systems with Integrity Constraints , 1997, IICIS.

[37]  Gösta Grahne,et al.  Dependency Satisfaction in Databases with Incomplete Information , 1984, VLDB.

[38]  Dan Suciu,et al.  Asymptotic Conditional Probabilities for Conjunctive Queries , 2005, ICDT.

[39]  Dan Olteanu,et al.  MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[40]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[41]  Christoph Koch,et al.  World-set decompositions: Expressiveness and efficient algorithms , 2007, Theor. Comput. Sci..

[42]  LINDA G. DEMICHIEL,et al.  Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains , 1989, IEEE Trans. Knowl. Data Eng..

[43]  M. Karnaugh The map method for synthesis of combinational logic circuits , 1953, Transactions of the American Institute of Electrical Engineers, Part I: Communication and Electronics.

[44]  Willard Van Orman Quine,et al.  The Problem of Simplifying Truth Functions , 1952 .

[45]  Jef Wijsen,et al.  Condensed Representation of Database Repairs for Consistent Query Answering , 2003, ICDT.

[46]  Sergio Greco,et al.  A Logical Framework for Querying and Repairing Inconsistent Databases , 2003, IEEE Trans. Knowl. Data Eng..

[47]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[48]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[49]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[50]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[51]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[52]  Norbert Fuhr,et al.  A Probabilistic NF2 Relational Algebra for Imprecision in Databases , 1997 .

[53]  E. McCluskey Minimization of Boolean functions , 1956 .

[54]  Renate A. Schmidt,et al.  Relational Grammars for Knowledge Representation , 2000 .

[55]  Suk Kyoon Lee,et al.  An Extended Relational Database Model for Uncertain and Imprecise Information , 1992, VLDB.

[56]  Aristides Gionis,et al.  Automated Ranking of Database Query Results , 2003, CIDR.

[57]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[58]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[59]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[60]  Gio Wiederhold,et al.  Flexible relation: an approach for integrating data from multiple, possibly inconsistent databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[61]  Bart Selman,et al.  Knowledge compilation and theory approximation , 1996, JACM.

[62]  E. F. Codd,et al.  Extending the database relational model to capture more meaning , 1979, ACM Trans. Database Syst..