Incomplete Data and Data Dependencies in Relational Databases

The chase has long been used as a central tool to analyze dependencies and their effect on queries. It has been applied to different relevant problems in database theory such as query optimization, query containment and equivalence, dependency implication, and database schema design. Recent years have seen a renewed interest in the chase as an important tool in several database applications, such as data exchange and integration, query answering in incomplete data, and many others. It is well known that the chase algorithm might be non-terminating and thus, in order for it to find practical applicability, it is crucial to identify cases where its termination is guaranteed. Another important aspect to consider when dealing with the chase is that it can introduce null values into the database, thereby leading to incomplete data. Thus, in several scenarios where the chase is used the problem of dealing with data dependencies and incomplete data arises. This book discusses fundamental issues concerning data dependencies and incomplete data with a particular focus on the chase and its applications in different database areas. We report recent results about the crucial issue of identifying conditions that guarantee the chase termination. Different database applications where the chase is a central tool are discussed with particular attention devoted to query answering in the presence of data dependencies and database schema design. Table of Contents: Introduction / Relational Databases / Incomplete Databases / The Chase Algorithm / Chase Termination / Data Dependencies and Normal Forms / Universal Repairs / Chase and Database Applications

[1]  Alin Deutsch,et al.  Query reformulation with constraints , 2006, SGMD.

[2]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[3]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[4]  Phokion G. Kolaitis,et al.  Conjunctive-query containment and constraint satisfaction , 1998, PODS.

[5]  Sven Hartmann,et al.  The implication problem of data dependencies over SQL table definitions: Axiomatic, algorithmic and logical characterizations , 2012, TODS.

[6]  Millist W. Vincent A Corrected 5NF Definition for Relational Database Design , 1997, Theor. Comput. Sci..

[7]  Werner Nutt,et al.  Deciding equivalences among conjunctive aggregate queries , 2007, JACM.

[8]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[9]  Chen Li,et al.  Data exchange: query answering for incomplete data sources , 2008, Infoscale.

[10]  Ronald Fagin,et al.  Data exchange: getting to the core , 2003, PODS '03.

[11]  Leonid Libkin,et al.  Data exchange and incomplete information , 2006, PODS '06.

[12]  Mark Levene,et al.  Database design for incomplete relations , 1999, TODS.

[13]  Philip A. Bernstein,et al.  Synthesizing third normal form relations from functional dependencies , 1976, TODS.

[14]  Moshe Y. Vardi The implication and finite implication problems for typed template dependencies , 1982, J. Comput. Syst. Sci..

[15]  Serge Abiteboul,et al.  Update Semantics for Incomplete Databases , 1985, VLDB.

[16]  Catriel Beeri,et al.  A complete axiomatization for functional and multivalued dependencies in database relations , 1977, SIGMOD '77.

[17]  Leopoldo E. Bertossi,et al.  Achieving Data Privacy through Secrecy Views and Null-Based Virtual Updates , 2011, IEEE Transactions on Knowledge and Data Engineering.

[18]  Witold Lipski On Relational Algebra with Marked Nulls. , 1984, PODS 1984.

[19]  Leopoldo E. Bertossi,et al.  An inconsistency tolerant approach to querying spatial databases , 2008, GIS '08.

[20]  Andrea Calì,et al.  Data integration under integrity constraints , 2004, Inf. Syst..

[21]  Moshe Y. Vardi Inferring multivalued dependencies from functional and join dependencies , 2004, Acta Informatica.

[22]  Marcelo Arenas,et al.  Relational and XML Data Exchange , 2010, Relational and XML Data Exchange.

[23]  Sergio Greco,et al.  A three-valued semantics for querying and repairing inconsistent databases , 2007, Annals of Mathematics and Artificial Intelligence.

[24]  Georg Lausen,et al.  On Chase Termination Beyond Stratification , 2009, Proc. VLDB Endow..

[25]  Sergio Greco,et al.  Polynomial time queries over inconsistent databases with functional dependencies and foreign keys , 2010, Data Knowl. Eng..

[26]  Georg Gottlob,et al.  The complexity of acyclic conjunctive queries , 2001, JACM.

[27]  Georg Gottlob,et al.  Conjunctive queries over trees , 2006, J. ACM.

[28]  Sergio Greco,et al.  Chase termination , 2010, Proc. VLDB Endow..

[29]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[30]  Catriel Beeri,et al.  Equivalence of Relational Database Schemes , 1981, SIAM J. Comput..

[31]  Pablo Barceló,et al.  Efficient approximations of conjunctive queries , 2012, PODS '12.

[32]  Ronald Fagin,et al.  Reverse data exchange: Coping with nulls , 2011 .

[33]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS.

[34]  Jorma Rissanen,et al.  Independent components of relations , 1977, TODS.

[35]  Dan Olteanu,et al.  $${10^{(10^{6})}}$$ worlds and beyond: efficient representation and processing of incomplete information , 2006, 2007 IEEE 23rd International Conference on Data Engineering.

[36]  E. F. Codd,et al.  Extending the database relational model to capture more meaning , 1979, ACM Trans. Database Syst..

[37]  Edward P. F. Chan A Possible World Semantics for Disjunctive Databases , 1993, IEEE Trans. Knowl. Data Eng..

[38]  Nicole Schweikardt,et al.  CWA-solutions for data exchange settings with target dependencies , 2007, PODS '07.

[39]  Anand Rajaraman,et al.  Conjunctive query containment revisited , 2000, Theor. Comput. Sci..

[40]  Alfred V. Aho,et al.  Equivalences Among Relational Expressions , 1979, SIAM J. Comput..

[41]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[42]  Alin Deutsch,et al.  Reformulation of XML Queries and Constraints , 2003, ICDT.

[43]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[44]  Phokion G. Kolaitis,et al.  Repair checking in inconsistent databases: algorithms and complexity , 2009, ICDT '09.

[45]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[46]  Mihalis Yannakakis,et al.  On the Complexity of Testing Implications of Functional and Join Dependencies , 1981, JACM.

[47]  Phokion G. Kolaitis,et al.  Peer data exchange , 2005, PODS '05.

[48]  Witold Lipski On relational algebra with marked nulls preliminary version , 1984, PODS '84.

[49]  Leopoldo E. Bertossi,et al.  Complexity of Consistent Query Answering in Databases Under Cardinality-Based and Incremental Repair Semantics , 2006, ICDT.

[50]  Adrian Onet,et al.  The chase procedure and its applications , 2012 .

[51]  Ronald Fagin,et al.  A normal form for preventing redundant tuples in relational databases , 2012, ICDT '12.

[52]  Raymond Reiter On Closed World Data Bases , 1977, Logic and Data Bases.

[53]  Sergio Greco,et al.  Repairs and Consistent Answers for XML Data with Functional Dependencies , 2003, Xsym.

[54]  Leopoldo E. Bertossi,et al.  Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints , 2005, DBPL.

[55]  Leopoldo E. Bertossi,et al.  Consistent query answering under spatial semantic constraints , 2011, Inf. Syst..

[56]  Filippo Furfaro,et al.  Querying and repairing inconsistent numerical databases , 2010, TODS.

[57]  Leopoldo E. Bertossi,et al.  Consistent query answering in databases , 2006, SGMD.

[58]  Sergio Tessaris,et al.  The Algebra and the Logic for SQL Nulls , 2012, SEBD.

[59]  Anthony C. Klug On conjunctive queries containing inequalities , 1988, JACM.

[60]  Catriel Beeri,et al.  A Proof Procedure for Data Dependencies , 1984, JACM.

[61]  Ronald Fagin,et al.  Inclusion Dependencies and Their Interaction with Functional Dependencies , 1984, J. Comput. Syst. Sci..

[62]  Ragnar Normann Minimal Lossless Decompositions and Some Normal Forms Between 4NF and PJ/NF , 1998, Inf. Syst..

[63]  Phokion G. Kolaitis,et al.  Answering aggregate queries in data exchange , 2008, PODS.

[64]  Sven Hartmann,et al.  When data dependencies over SQL tables meet the logics of paradox and S-3 , 2010, PODS '10.

[65]  Y. Edmund Lien,et al.  On the Equivalence of Database Models , 1982, JACM.

[66]  Lawrence J. Henschen,et al.  Deduction in non-Horn databases , 1985, Journal of Automated Reasoning.

[67]  Carlo Zaniolo,et al.  Database relations with null values , 1982, J. Comput. Syst. Sci..

[68]  Sergio Tessaris,et al.  On the Logic of SQL Nulls , 2012, AMW.

[69]  Georg Gottlob,et al.  Datalog±: a unified approach to ontologies and integrity constraints , 2009, ICDT '09.

[70]  Paolo Papotti,et al.  Core schema mappings , 2009, SIGMOD Conference.

[71]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2005, Theor. Comput. Sci..

[72]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[73]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[74]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[75]  Ronald Fagin The Decomposition Versus Synthetic Approach to Relational Database Design , 1977, VLDB.

[76]  Gösta Grahne,et al.  The Problem of Incomplete Information in Relational Databases , 1991, Lecture Notes in Computer Science.

[77]  David Maier,et al.  The Theory of Relational Databases , 1983 .

[78]  Jan Chomicki,et al.  Validity-Sensitive Querying of XML Databases Extended Abstract † , 2006 .

[79]  Andrea Calì,et al.  A general datalog-based framework for tractable query answering over ontologies , 2009, SEBD.

[80]  Witold Lipski,et al.  On semantic issues connected with incomplete information databases , 1979, ACM Trans. Database Syst..

[81]  Adrian Onet,et al.  Data correspondence, exchange and repair , 2010, ICDT '10.

[82]  Leopoldo E. Bertossi,et al.  Database Repairing and Consistent Query Answering , 2011, Database Repairing and Consistent Query Answering.

[83]  John Grant,et al.  Null Values in a Relational Data Base , 1977, Inf. Process. Lett..

[84]  Floris Geerts,et al.  Static analysis of schema-mappings ensuring oblivious termination , 2010, ICDT '10.

[85]  Phokion G. Kolaitis,et al.  Laconic Schema Mappings: Computing the Core with SQL Queries , 2009, Proc. VLDB Endow..

[86]  Jennifer Widom,et al.  Constraint checking with partial information , 1994, PODS.

[87]  Adrian Onet,et al.  On Conditional Chase Termination , 2011, AMW.

[88]  Bruno Marnette,et al.  Generalized schema-mappings: from termination to tractability , 2009, PODS.

[89]  André Hernich Foundations of query answering in relational data exchange , 2010 .

[90]  W. W. Armstrong,et al.  Dependency Structures of Data Base Relationships , 1974, IFIP Congress.

[91]  E. F. Codd,et al.  Recent Investigations in Relational Data Base Systems , 1974, ACM Pacific.

[92]  Sergio Greco,et al.  Active Integrity Constraints for Database Consistency Maintenance , 2009, IEEE Transactions on Knowledge and Data Engineering.

[93]  David Maier,et al.  Testing implications of data dependencies , 1979, SIGMOD '79.

[94]  Sergio Greco,et al.  A Logical Framework for Querying and Repairing Inconsistent Databases , 2003, IEEE Trans. Knowl. Data Eng..

[95]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[96]  Renée J. Miller,et al.  Clean Answers over Dirty Databases: A Probabilistic Approach , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[97]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[98]  G KolaitisPhokion,et al.  Reverse data exchange , 2011 .

[99]  Jan Chomicki,et al.  Consistent query answers in the presence of universal constraints , 2008, Inf. Syst..

[100]  Jack Minker Foundations of deductive databases and logic programming , 1988 .

[101]  Reinhard Pichler,et al.  Towards practical feasibility of core computation in data exchange , 2010, Theor. Comput. Sci..

[102]  Ronald Fagin,et al.  Normal forms and relational database operators , 1979, SIGMOD '79.

[103]  Dan Suciu,et al.  Parallel evaluation of conjunctive queries , 2011, PODS.

[104]  Gösta Grahne,et al.  Dependency Satisfaction in Databases with Incomplete Information , 1984, VLDB.

[105]  Andrea Calì,et al.  Advanced processing for ontological queries , 2010, Proc. VLDB Endow..

[106]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[107]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[108]  Masatoshi Yoshikawa,et al.  ILOG: Declarative Creation and Manipulation of Object Identifiers , 1990, VLDB.

[109]  Leopoldo E. Bertossi,et al.  The Semantics of Consistency and Trust in Peer Data Exchange Systems , 2007, LPAR.

[110]  Georg Gottlob,et al.  Efficient core computation in data exchange , 2008, JACM.

[111]  Jef Wijsen,et al.  On the first-order expressibility of computing certain answers to conjunctive queries over uncertain databases , 2010, PODS '10.

[112]  Ronald Fagin,et al.  Probabilistic data exchange , 2011, J. ACM.

[113]  Alfred V. Aho,et al.  The theory of joins in relational data bases , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[114]  Tomasz Imielinski,et al.  On Representing Incomplete Information in a Relational Data Base , 1981, VLDB.

[115]  Andrea Calì,et al.  On Equality-Generating Dependencies in Ontology Querying - Preliminary Report , 2011, AMW.

[116]  Jef Wijsen,et al.  Database repairing using updates , 2005, TODS.

[117]  Jianwen Su,et al.  Conjunctive Query Containment with Respect to Views and Constraints , 1996, Inf. Process. Lett..

[118]  Andrea Calì,et al.  On the decidability and complexity of query answering over inconsistent and incomplete databases , 2003, PODS.

[119]  Leopoldo E. Bertossi,et al.  Logic Programs for Consistently Querying Data Integration Systems , 2003, IJCAI.

[120]  Carlo Curino,et al.  Managing and querying transaction-time databases under schema evolution , 2008, Proc. VLDB Endow..

[121]  S. V. Petrov,et al.  Finite axiomatization of languages for representations of system properties: axiomatization of dependencies , 1989 .

[122]  Raymond Reiter,et al.  A sound and sometimes complete query evaluation algorithm for relational databases with null values , 1986, JACM.

[123]  Carlo Zaniolo,et al.  Analysis and design of relational schemata for database systems. , 1976 .

[124]  François Goasdoué,et al.  Answering queries using views: A KRDB perspective for the semantic Web , 2004, TOIT.

[125]  Inderpal Singh Mumick,et al.  The Stanford Data Warehousing Project , 1995 .

[126]  Leopoldo E. Bertossi,et al.  Semantically Correct Query Answers in the Presence of Null Values , 2006, EDBT Workshops.

[127]  Sven Hartmann,et al.  Design by example for SQL table definitions with functional dependencies , 2012, The VLDB Journal.

[128]  Jennifer Widom,et al.  Representing uncertain data: models, properties, and algorithms , 2009, The VLDB Journal.

[129]  Li-Yan Yuan,et al.  A sound and complete query evaluation algorithm for relational databases with null values , 1988, SIGMOD '88.

[130]  Surajit Chaudhuri,et al.  On the equivalence of recursive and nonrecursive datalog programs , 1992, J. Comput. Syst. Sci..

[131]  Paolo Atzeni,et al.  Functional Dependencies and Constraints on Null Values in Database Relations , 1986, Inf. Control..

[132]  Joachim Biskup,et al.  A Formal Approach to Null Values in Database Relations , 1979, Advances in Data Base Theory.

[133]  Tomasz Imielinski,et al.  Complexity Tailored Design: A New Design Methodology for Databases With Incomplete Information , 1995, J. Comput. Syst. Sci..

[134]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[135]  Phokion G. Kolaitis,et al.  On the data complexity of consistent query answering , 2012, ICDT.

[136]  Michael Meier,et al.  On the Termination of the Chase Algorithm , 2010, RR.

[137]  Eugene J. Shekita,et al.  Querying XML Views of Relational Data , 2001, VLDB.

[138]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[139]  Andrea Calì,et al.  Taming the Infinite Chase: Query Answering under Expressive Relational Constraints , 2008, Description Logics.

[140]  Leopoldo E. Bertossi,et al.  Consistent Query Answers in Virtual Data Integration Systems , 2005, Inconsistency Tolerance.

[141]  Tomasz Imielinski,et al.  Incomplete information and dependencies in relational databases , 1983, SIGMOD '83.

[142]  Jan Chomicki,et al.  Consistent Query Answering: Five Easy Pieces , 2007, ICDT.

[143]  Jan Chomicki,et al.  Minimal-change integrity maintenance using tuple deletions , 2002, Inf. Comput..

[144]  Alon Y. Halevy,et al.  Recursive Plans for Information Gathering , 1997, IJCAI.

[145]  Leopoldo E. Bertossi,et al.  Repair-oriented relational schemas for multidimensional databases , 2012, EDBT '12.

[146]  David S. Johnson,et al.  Testing Containment of Conjunctive Queries under Functional and Inclusion Dependencies , 1984, J. Comput. Syst. Sci..

[147]  Jack Minker,et al.  On Indefinite Databases and the Closed World Assumption , 1987, CADE.

[148]  Ronald Fagin,et al.  Locally consistent transformations and query answering in data exchange , 2004, PODS '04.

[149]  Sergio Greco,et al.  Stratification criteria and rewriting techniques for checking chase termination , 2011, Proc. VLDB Endow..

[150]  Ronald Fagin,et al.  Multivalued dependencies and a new normal form for relational databases , 1977, TODS.

[151]  Inderpal Singh Mumick,et al.  Maintenance Of Materialized Views , 1999 .

[152]  Alin Deutsch,et al.  The chase revisited , 2008, PODS.

[153]  Aleksy Schubert A Self-dependency Constraint in the Simply Typed Lambda Calculus , 2005, FCT.

[154]  Yannis Vassiliou Functional Dependencies and Incomplete Information , 1980, VLDB.

[155]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[156]  Sergio Greco,et al.  Probabilistic query answering over inconsistent databases , 2012, Annals of Mathematics and Artificial Intelligence.

[157]  Val Tannen,et al.  Conjunctive Queries and Mappings With Unequalities , 2008 .

[158]  Alin Deutsch,et al.  A chase too far , 2000, SIGMOD 2000.

[159]  Mark Levene,et al.  Axiomatisation of Functional Dependencies in Incomplete Relations , 1998, Theor. Comput. Sci..

[160]  Alin Deutsch,et al.  Physical Data Independence, Constraints, and Optimization with Universal Plans , 1999, VLDB.