Declarative Repairing Policies for Curated KBs

Curated ontologies and semantic annotations are increasingly being used in e-science to reflect the current terminology and conceptualization of scientific domains. Such curated Knowledge Bases (KBs) are usually backended by relational databases using adequate schemas (generic or application/domain specific) and may satisfy a wide range of integrity constraints. As curated KBs continuously evolve, such constraints are often violated and thus KBs need to be frequently repaired. Motivated by the fact that consistency is mostly enforced manually by the scientists acting as curators, we propose a generic and personalized repairing framework for assisting them in this arduous task. Our framework supports a variety of useful integrity constraints using Disjunctive Embedded Dependencies (DEDs) as well as complex curator preferences over interesting features of the resulting repairs (e.g., their size and type) that can capture diverse notions of minimality in repairs. Moreover, we propose a novel exhaustive repair finding algorithm which, unlike existing greedy frameworks, is not sensitive to the resolution order and syntax of violated constraints and can correctly compute globally optimal repairs for different kinds of constraints and preferences. Despite its exponential nature, the performance and memory requirements of the exhaustive algorithm are experimentally demonstrated to be satisfactory for real world curation cases, thanks to a series of optimizations.

[1]  Werner Kießling,et al.  Foundations of Preferences in Database Systems , 2002, VLDB.

[2]  Jan Chomicki,et al.  Preference formulas in relational queries , 2003, TODS.

[3]  Jan Chomicki,et al.  On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases , 2005, Inconsistency Tolerance.

[4]  Jarek Gryz,et al.  Algorithms and analyses for maximal vector computation , 2007, The VLDB Journal.

[5]  Georg Lausen,et al.  SPARQLing constraints for RDF , 2008, EDBT '08.

[6]  Shuai Ma,et al.  Improving Data Quality: Consistency and Accuracy , 2007, VLDB.

[7]  Leopoldo E. Bertossi,et al.  Semantically Correct Query Answers in the Presence of Null Values , 2006, EDBT Workshops.

[8]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[9]  Vassilis Christophides,et al.  Benchmarking Database Representations of RDF/S Stores , 2005, SEMWEB.

[10]  Vassilis Christophides,et al.  Ieee Transactions on Knowledge and Data Engineering on Graph Features of Semantic Web Schemas , 2022 .

[11]  Nicolas Spyratos,et al.  Efficient Rewriting Algorithms for Preference Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Deborah L. McGuinness,et al.  An Environment for Merging and Testing Large Ontologies , 2000, KR.

[13]  Sergey Melnik,et al.  Generic Model Management , 2004, Lecture Notes in Computer Science.

[14]  Leopoldo E. Bertossi,et al.  Consistent query answering in databases , 2006, SGMD.

[15]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[16]  Mark A. Musen,et al.  PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment , 2000, AAAI/IAAI.

[17]  Vassilis Christophides,et al.  A Formal Approach for RDF/S Ontology Evolution , 2008, ECAI.

[18]  Boris Motik,et al.  Bridging the gap between OWL and relational databases , 2007, WWW '07.

[19]  Phokion G. Kolaitis,et al.  Repair checking in inconsistent databases: algorithms and complexity , 2009, ICDT '09.

[20]  Vassilis Christophides,et al.  Containment and Minimization of RDF/S Query Patterns , 2005, SEMWEB.