Approximate Probabilistic Query Answering over Inconsistent Databases

The problem of managing and querying inconsistent databases has been deeply investigated in the last few years. Most of the approaches proposed so far rely on the notion of repair(a minimal set of delete/insert operations making the database consistent) and consistent query answer(the answer to a query is given by considering the set of `repaired' databases). Since the problem of consistent query answering is hard in the general case, most of the proposed techniques have an exponential complexity, although for special classes of constraints and queries the problem becomes polynomial. A second problem with most of the proposed approaches is that repairs do not take into account update operations (they consider delete and insert operations only). This paper presents a general framework where constraints consist of functional dependencies and queries may be expressed by positive relational algebra. The framework allows us to compute certain (i.e. tuples derivable from all or from none of the repaired databases) and uncertain query answers (i.e. tuples derivable from a proper not empty subset of the repaired databases). Each tuple in the answer is associated with a probability, which depends on the number of repaired databases from which the tuple can be derived. In the proposed framework, databases are repaired by means of update operations and repaired databases are stored by means of a "condensed" database, so that all the repaired databases can be derived by "expanding" the unique condensed database. A condensed database can be rewritten into a probabilistic database where each tuple is associated with an event (i.e. a boolean formula) and, thus, a probability value. The probabilistic query answer can be computed by querying the so obtained probabilistic database. As the complexity of querying probabilistic databases is #P-complete, approximate probabilistic answers which are computable in polynomial time are considered.

[1]  Jan Chomicki,et al.  Consistent Query Answering: Five Easy Pieces , 2007, ICDT.

[2]  Jan Chomicki,et al.  Minimal-change integrity maintenance using tuple deletions , 2002, Inf. Comput..

[3]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[4]  Sumit Sarkar,et al.  A probabilistic relational model and algebra , 1996, TODS.

[5]  Alex Samorodnitsky,et al.  Inclusion-exclusion: Exact and approximate , 1996, Comb..

[6]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[7]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[8]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS.

[9]  Noam Nisan,et al.  Approximate Inclusion-Exclusion , 1990, STOC '90.

[10]  Sergio Greco,et al.  A Logical Framework for Querying and Repairing Inconsistent Databases , 2003, IEEE Trans. Knowl. Data Eng..

[11]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[12]  Renée J. Miller,et al.  Clean Answers over Dirty Databases: A Probabilistic Approach , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Jef Wijsen,et al.  Project-Join-Repair: An Approach to Consistent Query Answering Under Functional Dependencies , 2006, FQAS.

[14]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[15]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[16]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[17]  Jef Wijsen,et al.  Database repairing using updates , 2005, TODS.

[18]  Norbert Fuhr,et al.  A probabilistic relational model for the integration of IR and databases , 1993, SIGIR.