Possible and Certain SQL Key

Driven by the dominance of the relational model, the requirements of modern applications, and the veracity of data, we revisit the fundamental notion of a key in relational databases with NULLs. In SQL database systems primary key columns are NOT NULL by default. NULL columns may occur in unique constraints which only guarantee uniqueness for tuples which do not feature null markers in any of the columns involved, and therefore serve a different function than primary keys. We investigate the notions of possible and certain keys, which are keys that hold in some or all possible worlds that can originate from an SQL table, respectively. Possible keys coincide with the unique constraint of SQL, and thus provide a semantics for their syntactic definition in the SQL standard. Certain keys extend primary keys to include NULL columns, and thus form a sufficient and necessary condition to identify tuples uniquely, while primary keys are only sufficient for that purpose. In addition to basic characterization, axiomatization, and simple discovery approaches for possible and certain keys, we investigate the existence and construction of Armstrong tables, and describe an indexing scheme for enforcing certain keys. Our experiments show that certain keys with NULLs do occur in real-world databases, and that related computational problems can be solved efficiently. Certain keys are therefore semantically well-founded and able to maintain data quality in the form of Codd's entity integrity rule while handling the requirements of modern applications, that is, higher volumes of incomplete data from different formats.

[1]  Sebastian Link,et al.  Probabilistic Keys for Data Quality Management , 2015, CAiSE.

[2]  Wenfei Fan,et al.  A revival of integrity constraints for data cleaning , 2008, Proc. VLDB Endow..

[3]  Dan Suciu,et al.  Query evaluation with soft-key constraints , 2008, PODS.

[4]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[5]  Serge Abiteboul,et al.  Transactions and integrity constraints , 1985, PODS '85.

[6]  Felix Naumann,et al.  Data profiling revisited , 2014, SGMD.

[7]  Paul Brown,et al.  GORDIAN: efficient and scalable discovery of composite keys , 2006, VLDB.

[8]  Sven Hartmann,et al.  Design by example for SQL table definitions with functional dependencies , 2012, The VLDB Journal.

[9]  Ronald Fagin,et al.  A normal form for relational databases that is based on domains and keys , 1981, TODS.

[10]  ZhouXiaofang,et al.  Possible and certain SQL keys , 2015, VLDB 2015.

[11]  Chengfei Liu,et al.  Discover Dependencies from Data—A Review , 2012, IEEE Transactions on Knowledge and Data Engineering.

[12]  Alin Deutsch,et al.  Complete yet practical search for minimal query reformulations under constraints , 2014, SIGMOD Conference.

[13]  Sven Hartmann,et al.  The implication problem of data dependencies over SQL table definitions: Axiomatic, algorithmic and logical characterizations , 2012, TODS.

[14]  Bettina Kemme,et al.  ConsAD: a real-time consistency anomalies detector , 2012, SIGMOD Conference.

[15]  Carlo Zaniolo Database relations with null values , 1982, PODS '82.

[16]  Bernhard Thalheim,et al.  On Semantic Issues Connected with Keys in Relational Databases Permitting Null Values , 1989, Journal of Information Processing and Cybernetics.

[17]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[18]  Georg Gottlob,et al.  Identifying the Minimal Transversals of a Hypergraph and Related Problems , 1995, SIAM J. Comput..

[19]  Heikki Mannila,et al.  Design of Relational Databases , 1992 .

[20]  Ronald Fagin,et al.  Horn clauses and database dependencies , 1982, JACM.

[21]  Leopoldo E. Bertossi,et al.  Database Repairing and Consistent Query Answering , 2011, Database Repairing and Consistent Query Answering.

[22]  Divesh Srivastava,et al.  Data quality: The other face of Big Data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[23]  Diego Calvanese,et al.  Capturing Relational Schemas and Functional Dependencies in RDFS , 2014, AAAI.

[24]  Sven Hartmann,et al.  On Codd Families of Keys over Incomplete Relations , 2011, Comput. J..

[25]  Mark Levene,et al.  Axiomatisation of Functional Dependencies in Incomplete Relations , 1998, Theor. Comput. Sci..

[26]  Jef Wijsen,et al.  The Data Complexity of Consistent Query Answering for Self-Join-Free Conjunctive Queries Under Primary Key Constraints , 2015, ACM Trans. Database Syst..

[27]  Ali Ghodsi,et al.  Bolt-on causal consistency , 2013, SIGMOD '13.

[28]  Andrea Calì,et al.  Data Integration under Integrity Constraints , 2004, CAiSE.

[29]  Mark Levene,et al.  A Generalisation of Entity and Referential Integrity in Relational Databases , 2001, RAIRO Theor. Informatics Appl..

[30]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[31]  Sebastian Link,et al.  Logical Foundations of Possibilistic Keys , 2014, JELIA.

[32]  Felix Naumann,et al.  Scalable Discovery of Unique Column Combinations , 2013, Proc. VLDB Endow..

[33]  E. F. Codd,et al.  The Relational Model for Database Management, Version 2 , 1990 .

[34]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.

[35]  HartmannSven,et al.  The implication problem of data dependencies over SQL table definitions , 2012 .

[36]  Kenneth A. Ross,et al.  Materialized view maintenance and integrity constraint checking: trading space for time , 1996, SIGMOD '96.

[37]  Joachim Biskup Security in Computing Systems - Challenges, Approaches and Solutions , 2008 .

[38]  Sven Hartmann,et al.  Efficient reasoning about a robust XML key fragment , 2009, TODS.

[39]  Y. Edmund Lien,et al.  On the Equivalence of Database Models , 1982, JACM.

[40]  Marina Fruehauf Design Of Relational Databases , 2016 .

[41]  Thomas J. Schaefer,et al.  The complexity of satisfiability problems , 1978, STOC.

[42]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[43]  Richard Statman,et al.  On the Structure of Armstrong Relations for Functional Dependencies , 1984, JACM.