Modeling and Querying Uncertain Relational Databases: a Survey of Approaches Based on the Possible Worlds Semantics

In this paper, we give an overview of the most representative approaches aimed at querying databases containing ill-known data, starting from the pioneering works by Codd and Lipski and up to very recent proposals. This study focuses on approaches with a clear and sound semantics, based on the notion of possible worlds. Three types of queries are considered: (i) those about attribute values (in an algebraic or SQL-like framework), (ii) those about the properties satisfied by a given set of worlds (i.e., a set of instances of an imprecise database), and (iii) those about the representation of uncertain data. For the first two types, it is emphasized that a trade-off has to be found between expressivity (of the model) and tractability (of the queries in the context of a given model).

[1]  Adnan Yazici,et al.  Design and Implementation Issues in the Fuzzy Object-Oriented Data Model , 1998, Inf. Sci..

[2]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Witold Lipski,et al.  On Databases with Incomplete Information , 1981, JACM.

[4]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[5]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Alain Pirotte,et al.  Imperfect Information in Relational Databases , 1996, Uncertainty Management in Information Systems.

[7]  Sven Helmer,et al.  Indexing a Fuzzy Database Using the Technique of Superimposed Coding - Cost Models and Measurements , 1996 .

[8]  Motohide Umano,et al.  Fuzzy relational algebra for possibility-distribution-fuzzy-relational model of fuzzy data , 1994, Journal of Intelligent Information Systems.

[9]  Etienne E. Kerre,et al.  A General Treatment of Data Redundancy in a Fuzzy Relational Data Model , 1992, J. Am. Soc. Inf. Sci..

[10]  Tomasz Imielinski,et al.  Complexity Tailored Design: A New Design Methodology for Databases With Incomplete Information , 1995, J. Comput. Syst. Sci..

[11]  R. Yager On the specificity of a possibility distribution , 1992 .

[12]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[13]  Solomon Eyal Shimony,et al.  A Probabilistic Object-Oriented Data Model , 1994, Data Knowl. Eng..

[14]  Rajshekhar Sunderraman,et al.  Indefinite and maybe information in relational databases , 1990, TODS.

[15]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[16]  Christoph Koch,et al.  World-set decompositions: Expressiveness and efficient algorithms , 2007, Theor. Comput. Sci..

[17]  Tomasz Imielinski,et al.  Complexity of query processing in databases with OR-objects , 1989, PODS '89.

[18]  O. Pivert,et al.  On the comparison of imprecise values in fuzzy databases , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[19]  V. S. Subrahmanian,et al.  Stable Semantics for Probabilistic Deductive Databases , 1994, Inf. Comput..

[20]  Maurice van Keulen,et al.  A probabilistic XML approach to data integration , 2005, 21st International Conference on Data Engineering (ICDE'05).

[21]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS '07.

[22]  Laks V. S. Lakshmanan,et al.  Modeling Uncertainty in Deductive Databases , 1994, DEXA.

[23]  Sumit Sarkar,et al.  A probabilistic relational model and algebra , 1996, TODS.

[24]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[25]  Sven Helmer,et al.  Evaluating different approaches for indexing fuzzy sets , 2003, Fuzzy Sets Syst..

[26]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[27]  Minoru Ito,et al.  A Probabilistic Database Model with Representability of Dependency among Tuples , 2001 .

[28]  John Grant,et al.  Incomplete Information in a Relational Database , 1980, Fundamenta Informaticae.

[29]  Limsoon Wong,et al.  Semantic representations and query languages for or-sets , 1993, PODS '93.

[30]  Juan C. Cubero,et al.  A new definition of fuzzy functional dependency in fuzzy relational databases , 1994, Int. J. Intell. Syst..

[31]  Patrick Bosc,et al.  About Inclusion-Based Generalized Yes/No Queries in a Possibilistic Database Context , 2006, ISMIS.

[32]  F. Petry,et al.  Fuzzy rough set techniques for uncertainty processing in a relational database , 2000 .

[33]  Christopher Ré,et al.  Efficient Evaluation of , 2007, DBPL.

[34]  Mario Piattini,et al.  Representation of fuzzy knowledge in relational databases , 2004 .

[35]  Olga Pons,et al.  GEFRED: A Generalized Model of Fuzzy Relational Databases , 1994, Inf. Sci..

[36]  Zbigniew Michalewicz,et al.  Sets and Uncertainty in Relational Databases , 1988, IPMU.

[37]  Wei Yi Liu,et al.  The fuzzy functional dependency on the basis of the semantic distance , 1993 .

[38]  Leonid Libkin A Relational Algebra for Complex Objects Based on Partial Information , 1991, MFDBS.

[39]  Alain Pirotte,et al.  Imperfect knowledge in databases , 1997 .

[40]  Dan Olteanu,et al.  Using OBDDs for Efficient Query Evaluation on Probabilistic Databases , 2008, SUM.

[41]  Fereidoon Sadri,et al.  Modeling uncertainty in databases , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[42]  Patrick Bosc,et al.  Value-Based and Representation-Based Querying of Possibilistic Databases , 2000 .

[43]  E. F. Codd,et al.  Extending the database relational model to capture more meaning , 1979, ACM Trans. Database Syst..

[44]  Patrick Bosc,et al.  On the evaluation of cardinality-based generalized yes/no queries , 2007 .

[45]  Norbert Fuhr,et al.  A probabilistic NF2 relational algebra for integrated information retrieval and database systems , 1996 .

[46]  Alex Dekhtyar,et al.  A Framework for Management of Semistructured Probabilistic Data , 2005, Journal of Intelligent Information Systems.

[47]  Nancy Van Gyseghem,et al.  Imprecision and Uncertainty in the UFO Database Model , 1998, J. Am. Soc. Inf. Sci..

[48]  Anca L. Ralescu,et al.  Improved retrieval in a fuzzy database from adjusted user input , 1995, Journal of Intelligent Information Systems.

[49]  Christopher Ré,et al.  Query Evaluation on Probabilistic Databases , 2006, IEEE Data Eng. Bull..

[50]  Xiang Lian,et al.  Probabilistic ranked queries in uncertain databases , 2008, EDBT '08.

[51]  Gultekin Özsoyoglu,et al.  Incomplete Relational Database Models Based on Intervals , 1993, IEEE Trans. Knowl. Data Eng..

[52]  Patrick Bosc,et al.  Indexing principles for a fuzzy data base , 1989, Inf. Syst..

[53]  Gösta Grahne Horn tables-an efficient tool for handling incomplete information in databases , 1989, PODS '89.

[54]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[55]  Patrick Bosc,et al.  About projection-selection-join queries addressed to possibilistic relational databases , 2005, IEEE Transactions on Fuzzy Systems.

[56]  Chi-Yin Chow,et al.  Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[57]  Serge Abiteboul,et al.  Querying and Updating Probabilistic Information in XML , 2006, EDBT.

[58]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[59]  Patrick Bosc,et al.  About Possibilistic Queries and Their Evaluation , 2007, IEEE Transactions on Fuzzy Systems.

[60]  V. S. Subrahmanian,et al.  PXML: a probabilistic semistructured data model and algebra , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[61]  Michael Pittarelli,et al.  An Algebra for Probabilistic Databases , 1994, IEEE Trans. Knowl. Data Eng..

[62]  B. Buckles,et al.  A fuzzy representation of data for relational databases , 1982 .

[63]  Dan Olteanu,et al.  MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[64]  Nancy Van Gyseghem,et al.  Imprecision and uncertainty in the UFO database model , 1998 .

[65]  Thomas Lukasiewicz,et al.  Extension of the Relational Algebra to Probabilistic Complex Values , 2000, FoIKS.

[66]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[67]  Patrick Bosc,et al.  About yes/no queries against possibilistic databases , 2007, Int. J. Intell. Syst..

[68]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[69]  Patrick Bosc,et al.  From Boolean to fuzzy algebraic queries in a possibilistic database framework , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[70]  Anca L. Ralescu,et al.  Adapting Query Representation to Improve Retrieval in a fuzzy Database , 1995, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[71]  Patrick Bosc,et al.  About the processing of possibilistic queries involving a difference operation , 2006, Fuzzy Sets Syst..

[72]  E. Kerre,et al.  A General Treatment of Data Redundancy in a Fuzzy Relational Data Model. , 1992 .

[73]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[74]  Witold Lipski,et al.  On semantic issues connected with incomplete information databases , 1979, ACM Trans. Database Syst..

[75]  Henri Prade,et al.  Generalizing Database Relational Algebra for the Treatment of Incomplete/Uncertain Information and Vague Queries , 1984, Inf. Sci..

[76]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[77]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[78]  Esteban Zimányi,et al.  Query Evaluation in Probabilistic Relational Databases , 1997, Theor. Comput. Sci..

[79]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[80]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[81]  Jennifer Widom,et al.  Databases with uncertainty and lineage , 2008, The VLDB Journal.

[82]  Gösta Grahne,et al.  Dependency Satisfaction in Databases with Incomplete Information , 1984, VLDB.

[83]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[84]  Joan M. Morrissey,et al.  Imprecise information and uncertainty in information systems , 1990, TOIS.

[85]  Patrick Bosc,et al.  Vacuity-Oriented Generalized Yes/No Queries Addressed to Possibilistic Databases , 2006 .

[86]  Raymond Reiter,et al.  A sound and sometimes complete query evaluation algorithm for relational databases with null values , 1986, JACM.

[87]  Bernadette Bouchon-Meunier,et al.  Towards general measures of comparison of objects , 1996, Fuzzy Sets Syst..

[88]  Arun K. Majumdar,et al.  Fuzzy Functional Dependencies and Lossless Join Decomposition of Fuzzy Relational Database Systems , 1988, ACM Trans. Database Syst..

[89]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[90]  Joachim Biskup,et al.  A foundation of CODD's relational maybe-operations , 1983, TODS.

[91]  Didier Dubois,et al.  Readings in Fuzzy Sets for Intelligent Systems , 1993 .

[92]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[93]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[94]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[95]  Xi Zhang,et al.  On the semantics and evaluation of top-k queries in probabilistic databases , 2008, ICDE Workshops.