10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information

We present a decomposition-based approach to managing incomplete information. We introduce world-set decompositions (WSDs), a space-efficient and complete representation system for finite sets of worlds. We study the problem of efficiently evaluating relational algebra queries on world-sets represented by WSDs. We also evaluate our technique experimentally in a large census data scenario and show that it is both scalable and efficient.

[1]  Sunil Prabhakar,et al.  U-DBMS: A Database System for Managing Constantly-Evolving Data , 2005, VLDB.

[2]  Christoph Koch,et al.  Approximating predicates and expressive queries on probabilistic databases , 2008, PODS.

[3]  Dan Olteanu,et al.  Efficient Representation and Processing of Incomplete Information , 2006 .

[4]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[5]  Gösta Grahne,et al.  The Problem of Incomplete Information in Relational Databases , 1991, Lecture Notes in Computer Science.

[6]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[7]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[8]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[9]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Daniela Florescu,et al.  AJAX: An Extensible Data Cleaning Tool , 2000, SIGMOD Conference.

[11]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[12]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[14]  Rahul Gupta,et al.  Creating probabilistic databases from information extraction models , 2006, VLDB.

[15]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  David Maier,et al.  Testing implications of data dependencies , 1979, SIGMOD '79.

[17]  Alfred V. Aho,et al.  The theory of joins in relational data bases , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[18]  Christoph Koch,et al.  World-set decompositions: Expressiveness and efficient algorithms , 2007, Theor. Comput. Sci..

[19]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[20]  Andrea Calì,et al.  On the decidability and complexity of query answering over inconsistent and incomplete databases , 2003, PODS.

[21]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[22]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[23]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[24]  Dan Olteanu,et al.  Conditioning probabilistic databases , 2008, Proc. VLDB Endow..

[25]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[26]  Tomasz Imielinski,et al.  Incomplete object—a data model for design and planning applications , 1991, SIGMOD '91.

[27]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[28]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[29]  Gösta Grahne,et al.  Dependency Satisfaction in Databases with Incomplete Information , 1984, VLDB.

[30]  S. Ruggles Integrated Public Use Microdata Series , 2021, Encyclopedia of Gerontology and Population Aging.

[31]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[32]  Susanne E. Hambrusch,et al.  Database Support for Probabilistic Attributes and Tuples , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[33]  Dan Olteanu,et al.  World-Set Decompositions: Expressiveness and Efficient Algorithms , 2007, ICDT.

[34]  Christoph Koch,et al.  A compositional framework for complex queries over uncertain data , 2009, ICDT '09.

[35]  Renée J. Miller,et al.  Clean Answers over Dirty Databases: A Probabilistic Approach , 2006, 22nd International Conference on Data Engineering (ICDE'06).