Empirical evidence for the usefulness of Armstrong tables in the acquisition of semantically meaningful SQL constraints

SQL schema designs result from methodologies such as UML, Entity-Relationship models, description logics, or relational normalization. Independently of the methodology, sample data is promoted by academia and industry to consolidate the schema designs produced. SQL constraints are an abstract standard-compliant encoding of the designers' perception about the semantics of an application domain. Armstrong tables can visualize SQL constraints concisely, in the sense that they satisfy all constraints perceived meaningful and violate all constraints perceived meaningless. Using new empirical measures we investigate how Armstrong tables help design teams recognize domain semantics. Extensive experiments confirm that users of Armstrong tables are likely to recognize domain semantics they would overlook otherwise. Armstrong tables therefore complement existing schema design methodologies in producing quality schemata that process data efficiently.

[1]  Johann A. Makowsky,et al.  BCNF via Attribute Splitting , 2012, Conceptual Modelling and Its Theoretical Foundations.

[2]  John W. Creswell,et al.  Research Design: Qualitative, Quantitative, and Mixed Methods Approaches , 2010 .

[3]  Bernhard Thalheim,et al.  Armstrong Databases and Reasoning for Functional Dependencies and Cardinality Constraints over Partial Bags , 2012, FoIKS.

[4]  Ronald Fagin,et al.  Horn clauses and database dependencies , 1982, JACM.

[5]  Y. Edmund Lien,et al.  On the Equivalence of Database Models , 1982, JACM.

[6]  Marina Fruehauf Design Of Relational Databases , 2016 .

[7]  Sebastian Link,et al.  Possible and Certain SQL Key , 2015, Proc. VLDB Endow..

[8]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[9]  Sven Hartmann,et al.  Design by example for SQL table definitions with functional dependencies , 2012, The VLDB Journal.

[10]  Bernhard Thalheim,et al.  Fundamentals of Cardinality Constraints , 1992, ER.

[11]  Bernhard Thalheim,et al.  Entity-relationship modeling - foundations of database technology , 2010 .

[12]  Sebastian Link,et al.  Logical Foundations of Possibilistic Keys , 2014, JELIA.

[13]  Sven Hartmann,et al.  The implication problem of data dependencies over SQL table definitions: Axiomatic, algorithmic and logical characterizations , 2012, TODS.

[14]  C. J. Date SQL and Relational Theory - How to Write Accurate SQL Code, Second Edition , 2012, Theory in practice.

[15]  Sven Hartmann,et al.  Constraint acquisition for Entity-Relationship models , 2009, Data Knowl. Eng..

[16]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[17]  Jing Wang,et al.  Constructing Armstrong tables for general cardinality constraints and not-null constraints , 2014, Annals of Mathematics and Artificial Intelligence.

[18]  Sven Hartmann,et al.  Codd Table Representations under Weak Possible World Semantics , 2011, DEXA.

[19]  Søren Lauesen,et al.  Preventing Requirement Defects: An Experiment in Process Improvement , 2001, Requirements Engineering.

[20]  Sebastian Link,et al.  Probabilistic Keys for Data Quality Management , 2015, CAiSE.

[21]  Sebastian Link,et al.  Effective Recognition and Visualization of Semantic Requirements by Perfect SQL Samples , 2013, ER.

[22]  C. J. Date Database Design and Relational Theory: Normal Forms and All That Jazz , 2012 .

[23]  Bernhard Thalheim,et al.  An Informal and Efficient Approach for Obtaining Semantic Constraints Using Sample Data and Natural Language Processing , 1995, Semantics in Databases.

[24]  E. V. Ravve,et al.  Dependency Preserving Refinements and the Fundamental Problem of Database Design , 1998, Data Knowl. Eng..

[25]  Sven Hartmann,et al.  Efficiency frontiers of XML cardinality constraints , 2013, Data Knowl. Eng..

[26]  Sven Hartmann,et al.  On Codd Families of Keys over Incomplete Relations , 2011, Comput. J..

[27]  Michel A. Melkanoff,et al.  A Method for Helping Discover the Dependencies of a Relation , 1979, Advances in Data Base Theory.

[28]  Sebastian Link,et al.  Armstrong Databases: Validation, Communication and Consolidation of Conceptual Models with Perfect Test Data , 2012, APCCM.

[29]  Sebastian Link,et al.  Schema- and Data-driven Discovery of SQL Keys , 2012, J. Comput. Sci. Eng..

[30]  Lei Chen,et al.  Efficient discovery of similarity constraints for matching dependencies , 2013, Data Knowl. Eng..

[31]  Carlo Zaniolo,et al.  Database relations with null values , 1982, J. Comput. Syst. Sci..

[32]  Andrew J. McAllister Complete Rules for n-Ary Relationship Cardinality Constraints , 1998, Data Knowl. Eng..

[33]  Peter P. Chen The entity-relationship model: toward a unified view of data , 1975, VLDB '75.

[34]  Paul D Jeanne Ellis Ormrod Leedy,et al.  Practical Research: Planning and Design , 1974 .

[35]  Jean-Marc Petit,et al.  Semantic sampling of existing databases through informative Armstrong databases , 2007, Inf. Syst..

[36]  Philip A. Bernstein,et al.  Synthesizing third normal form relations from functional dependencies , 1976, TODS.

[37]  Richard Statman,et al.  On the Structure of Armstrong Relations for Functional Dependencies , 1984, JACM.

[38]  Michael Eichberg,et al.  A Handbook of Software and Systems Engineering , 2009 .

[39]  David W. Embley,et al.  Cardinality Constraints in Semantic Data Models , 1993, Data Knowl. Eng..

[40]  Robert A. Maksimchuk,et al.  UML for Database Design , 2001 .

[41]  David Maier Minimum Covers in Relational Database Model , 1980, JACM.

[42]  Sven Hartmann,et al.  On the implication problem for cardinality constraints and functional dependencies , 2001, Annals of Mathematics and Artificial Intelligence.

[43]  Sebastian Link,et al.  Probabilistic Cardinality Constraints , 2015, ER.

[44]  Heikki Mannila,et al.  Design by Example: An Application of Armstrong Relations , 1986, J. Comput. Syst. Sci..

[45]  Sebastian Link,et al.  SQL-Sampler: A Tool to Visualize and Consolidate Domain Semantics by Perfect SQL Sample Data , 2014, APCCM.

[46]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[47]  HartmannSven,et al.  The implication problem of data dependencies over SQL table definitions , 2012 .

[48]  Srinath Srinivasa,et al.  A generic framework and methodology for extracting semantics from co-occurrences , 2014, Data Knowl. Eng..

[49]  Paolo Atzeni,et al.  Functional Dependencies and Constraints on Null Values in Database Relations , 1986, Inf. Control..

[50]  James Martin,et al.  Information engineering , 1981 .

[51]  Professor Dr. Bernhard Thalheim Entity-Relationship Modeling , 2000, Springer Berlin Heidelberg.

[52]  Ronald Fagin,et al.  Inclusion Dependencies and Their Interaction with Functional Dependencies , 1984, J. Comput. Syst. Sci..

[53]  Lukasz Golab,et al.  Sampling the repairs of functional dependency violations under hard constraints , 2010, Proc. VLDB Endow..

[54]  David Maier,et al.  The Theory of Relational Databases , 1983 .

[55]  Sebastian Link,et al.  Empirical evidence for the usefulness of Armstrong relations in the acquisition of meaningful functional dependencies , 2010, Inf. Syst..

[56]  Sebastian Link,et al.  Cardinality Constraints for Uncertain Data , 2014, ER.