Deciding implication for functional dependencies in complex-value databases

Modern applications increasingly require the storage of data beyond relational structure. The challenge of providing well-founded data models that can handle complex objects such as lists, sets, multisets, unions and references has not been met yet in a completely satisfactory way. The success of such data models will greatly depend on the existence of automated database design techniques that generalise achievements from relational databases. In this paper, we study the implication problem of functional dependencies (FDs) in the presence of records, sets, multisets and lists. Database schemata are defined as nested attributes, database instances as nested relations and FDs are defined in terms of subattributes of the database schema. The expressiveness of FDs deviates fundamentally from previous approaches in different data models including the nested relational data model and XML.The implication problem is to decide whether for an arbitrary database schema, and an arbitrary set Σ ∪ {σ} of FDs defined on that schema, every database instance that satisfies all FDs in Σ also satisfies σ. The difficulty in generalising the solution from the relational data model to the presence of sets and multisets is caused by the fact that the value on the join of subattributes is no longer determined by the values on the subattributes. Based on the notion of a unit, we propose to decompose the database schema in such a way that the closure of a set of nested attributes can be computed on the components of the schema. The implementation of the algorithm is based on a representation theorem for Brouwerian algebras. The main contribution is the proof that the algorithm works correctly and in polynomial-time in the size of the input. Defining the size of the input is not trivial since the measure should both generalise the one that is used for relational databases and do justice to the presence of sets and multisets. Our solution to the implication problem allows to solve other important problems that occur in database design. We present polynomial-time algorithms to determine non-redundant covers of sets of FDs, and to decide whether a given set of subattributes forms a superkey.

[1]  Philip A. Bernstein,et al.  What does Boyce-Codd Normal Form Do? , 1980, VLDB.

[2]  Sven Hartmann,et al.  The Nested List Normal Form for Functional and Multivalued Dependencies , 2006, FoIKS.

[3]  Sven Hartmann,et al.  On Functional Dependencies in Advanced Data Models , 2003, Electron. Notes Theor. Comput. Sci..

[4]  David Maier Minimum Covers in Relational Database Model , 1980, JACM.

[5]  Z. Meral Özsoyoglu,et al.  A new normal form for nested relations , 1987, TODS.

[6]  Ronald Fagin,et al.  A normal form for relational databases that is based on domains and keys , 1981, TODS.

[7]  Catriel Beeri,et al.  Equivalence of Relational Database Schemes , 1981, SIAM J. Comput..

[8]  L. A. Kalinichenko Advances in Databases and Information Systems: 7th East European Conference, ADBIS 2003, Dresden, Germany, September 3-6, 2003, Proceedings , 2003 .

[9]  Marcelo Arenas,et al.  An information-theoretic approach to normal forms for relational and XML data , 2003, PODS.

[10]  Catriel Beeri,et al.  A Sophisticate's Introduction to Database Normalization Theory , 1978, VLDB.

[11]  Catriel Beeri,et al.  A Formal Approach to Object-Oriented Databases , 1990, Data Knowl. Eng..

[12]  Jinyan Li,et al.  Bioinformatics Adventures in Database Research , 2003, ICDT.

[14]  Yatsuka Nakamura,et al.  Armstrong's Axioms , 2007 .

[15]  Hans-Jörg Schek,et al.  A Relational Object Model , 1990, ICDT.

[16]  Wenfei Fan,et al.  On XML integrity constraints in the presence of DTDs , 2001, PODS '01.

[17]  Wenfei Fan,et al.  Constraints for semistructured data and XML , 2001, SGMD.

[18]  Bernhard Thalheim,et al.  Fundamental Concepts of Object Oriented Databases , 1993, Acta Cybern..

[19]  Bernhard Thalheim,et al.  Entity-relationship modeling - foundations of database technology , 2010 .

[20]  David J. DeWitt,et al.  The Object-Oriented Database System Manifesto , 1994, Building an Object-Oriented Database System, The Story of O2.

[21]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[22]  Ming-Chien Shan,et al.  Iris: An Object-Oriented Database Management System , 1989, ACM Trans. Inf. Syst..

[23]  Joachim Biskup,et al.  Achievements of Relational Database Schema Design Theory Revisited , 1995, Semantics in Databases.

[24]  Tok Wang Ling,et al.  Designing Functional Dependencies for XML , 2002, EDBT.

[25]  Shamim A. Naqvi,et al.  A Logical Language for Data and Knowledge Bases , 1989 .

[26]  Mark Levene,et al.  Semantics for null extended nested relations , 1993, TODS.

[27]  Joachim Biskup,et al.  Database Schema Design Theory : Achievements and Challenges , 1995, CISMOD.

[28]  Victor Vianu,et al.  A Web Odyssey: from Codd to XML , 2001, PODS.

[29]  Ronald Fagin,et al.  Multivalued dependencies and a new normal form for relational databases , 1977, TODS.

[30]  Dan Suciu,et al.  On database theory and XML , 2001, SGMD.

[31]  Sven Hartmann,et al.  A Membership Algorithm for Functional and Multi-valued Dependencies in the Presence of Lists , 2004, Electron. Notes Theor. Comput. Sci..

[32]  Miron Livny,et al.  The Design and Implementation of a Sequence Database System , 1996, VLDB.

[33]  Carmem S. Hara,et al.  Reasoning about nested functional dependencies , 1999, PODS '99.

[34]  Sven Hartmann,et al.  On the implication problem for cardinality constraints and functional dependencies , 2001, Annals of Mathematics and Artificial Intelligence.

[35]  A. Tarski,et al.  On Closed Elements in Closure Algebras , 1946 .

[36]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[37]  Wenfei Fan,et al.  Integrity constraints for XML , 2000, PODS '00.

[38]  A BernsteinPhilip,et al.  Computational problems related to the design of normal form relational schemas , 1979 .

[39]  Bruno Courcelle,et al.  Fundamental Properties of Infinite Trees , 1983, Theor. Comput. Sci..

[40]  A. Tarski,et al.  The Algebra of Topology , 1944 .

[41]  Klaus-Dieter Schewe,et al.  Reasoning about Functional and Multi-valued Dependencies in the Presence of Lists , 2004, FoIKS.

[42]  Jeffrey D. Ullman,et al.  Principles of Database Systems , 1980 .

[43]  Bernhard Thalheim,et al.  Dependencies in relational databases , 1991, Teubner-Texte zur Mathematik.

[44]  Sven Hartmann,et al.  Normalisation in the Presence of Lists , 2004, ADC.

[45]  Grant E. Weddell,et al.  Reasoning About Equations and Functional Dependencies on Complex Objects , 1994, IEEE Trans. Knowl. Data Eng..

[46]  David W. Embley,et al.  A normal form for precisely characterizing redundancy in nested relations , 1996, TODS.

[47]  Gianfranco Lamperti,et al.  On Multisets in Database Systems , 2000, WMP.

[48]  Catriel Beeri,et al.  A complete axiomatization for functional and multivalued dependencies in database relations , 1977, SIGMOD '77.

[49]  Sven Hartmann,et al.  More Functional Dependencies for XML , 2003, ADBIS.

[50]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[51]  Agostino Dovier,et al.  A Uniform Axiomatic View of Lists, Multisets, and Sets, and the Relevant Unification Algorithms , 1998, Fundam. Informaticae.

[52]  Jixue Liu,et al.  Multivalued Dependencies in XML , 2003, BNCOD.

[53]  Zahir Tari,et al.  Object normal forms and dependency constraints for object-oriented schemata , 1997, TODS.

[54]  J. van Leeuwen,et al.  Theoretical Computer Science , 2003, Lecture Notes in Computer Science.

[55]  M. Dummett Elements of Intuitionism , 2000 .

[56]  Klaus-Dieter Schewe,et al.  Axiomatisations of functional dependencies in the presence of records, lists, sets and multisets , 2006, Theor. Comput. Sci..

[57]  Chengfei Liu,et al.  A Redundancy Free 4NF for XML , 2003, Xsym.

[58]  Peter P. Chen The Entity-Relationship Model: Towards a unified view of Data , 1976 .

[59]  Sven Hartmann,et al.  Multi-valued dependencies in the presence of lists , 2004, PODS '04.

[60]  M. Levene A universal relation model for a nested database , 1992 .

[61]  Peter P. Chen The entity-relationship model: toward a unified view of data , 1975, VLDB '75.

[62]  Grant E. Weddell,et al.  Reasoning about functional dependencies generalized for semantic data models , 1992, TODS.

[63]  Ronald Fagin,et al.  The theory of data dependencies - a survey , 1984 .

[64]  Klaus-Dieter Schewe,et al.  Functional and multivalued dependencies in nested databases generated by record and list constructor , 2006, Annals of Mathematics and Artificial Intelligence.

[65]  Dominique Pastre,et al.  Managing Complex Objects in an Extensible Relational DBMS , 1989, VLDB.

[66]  Daniel Le Métayer,et al.  Programming by multiset transformation , 1993, CACM.

[67]  Gérard Berry,et al.  The chemical abstract machine , 1989, POPL '90.

[68]  Peer Kröger,et al.  A Computational Biology Database Digest: Data, Data Analysis, and Data Management , 2004, Distributed and Parallel Databases.

[69]  Marc Gyssens,et al.  The Structure of the Relational Database Model , 1989, EATCS Monographs on Theoretical Computer Science.

[70]  Jixue Liu,et al.  Functional Dependencies for XML , 2003, APWeb.

[71]  M. W. VINCENT,et al.  Update Anomalies and the Justification for 4NF in Relational Databases , 1994, Inf. Sci..

[72]  Marcelo Arenas,et al.  A normal form for XML documents , 2002, PODS '02.

[73]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[74]  Catriel Beeri,et al.  On the menbership problem for functional and multivalued dependencies in relational databases , 1980, TODS.

[75]  Sven Hartmann Decomposing relationship types by pivoting and schema equivalence , 2001, Data Knowl. Eng..

[76]  E. F. Codd,et al.  Recent Investigations in Relational Data Base Systems , 1974, ACM Pacific.

[77]  MILLIST W. VINCENT,et al.  Redundancy and the Justification for Fourth Normal Form in Relational Databases , 1993, Int. J. Found. Comput. Sci..

[78]  Roger King,et al.  Semantic database modeling: survey, applications, and research issues , 1987, CSUR.

[79]  Joel E. Richardson,et al.  Supporting Lists in a Data Model (A Timely Approach) , 1992, VLDB.

[80]  Mark Levene,et al.  The Nested Universal Relation Database Model , 1992, Lecture Notes in Computer Science.

[81]  Philip Alan Bernstein,et al.  Normalization and functional dependencies in the relational data base model. , 1975 .

[82]  Philip A. Bernstein,et al.  Synthesizing third normal form relations from functional dependencies , 1976, TODS.

[83]  Egon Börger,et al.  Trends in theoretical computer science , 1988 .

[84]  Sven Hartmann,et al.  The Implication Problem of Functional Dependencies in Complex-value Databases , 2005, WoLLIC.