Annotation Based Query Answer over Inconsistent Database

In this paper, we introduce a concept of Annotation Based Query Answer, and a method for its computation, which can answer queries on relational databases that may violate a set of functional dependencies. In this approach, inconsistency is viewed as a property of data and described with annotations. To be more precise, every piece of data in a relation can have zero or more annotations with it and annotations are propagated along with queries from the source to the output. With annotations, inconsistent data in both input tables and query answers can be marked out but preserved, instead of being filtered in most previous work. Thus this approach can avoid information loss, a vital and common deficiency of most previous work in this area. To calculate query answers on an annotated database, we propose an algorithm to annotate the input tables, and redefine the five basic relational algebra operations (selection, projection, join, union and difference) so that annotations can be correctly propagated as the valid set of functional dependency changes during query processing. We also prove the soundness and completeness of the whole annotation computing system. Finally, we implement a prototype of our system, and give some performance experiments, which demonstrate that our approach is reasonable in running time, and excellent in information preserving.

[1]  Jian Pei,et al.  Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Loreto Bravo,et al.  Efficient Approximation Algorithms for Repairing Inconsistent Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[4]  Leopoldo E. Bertossi,et al.  Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints , 2005, DBPL.

[5]  Jef Wijsen,et al.  Database repairing using updates , 2005, TODS.

[6]  Leopoldo E. Bertossi,et al.  Logic Programs for Consistently Querying Data Integration Systems , 2003, IJCAI.

[7]  Jan Chomicki,et al.  Consistent Query Answering: Five Easy Pieces , 2007, ICDT.

[8]  Jan Chomicki,et al.  Minimal-change integrity maintenance using tuple deletions , 2002, Inf. Comput..

[9]  Francesco Scarcello,et al.  Census Data Repair: a Challenging Application of Disjunctive Logic Programming , 2001, LPAR.

[10]  Renée J. Miller,et al.  Clean Answers over Dirty Databases: A Probabilistic Approach , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[12]  Leopoldo E. Bertossi,et al.  The complexity and approximation of fixing numerical attributes in databases under integrity constraints , 2008, Inf. Syst..

[13]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[14]  Anthony C. Klug Calculating constraints on relational expression , 1980, TODS.

[15]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[16]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[18]  吴爱华,et al.  Annotation Based Query Answer over Inconsistent Database , 2010 .

[19]  Xi Zhang,et al.  On the semantics and evaluation of top-k queries in probabilistic databases , 2008, ICDE Workshops.