Tractable database design and datalog abduction through bounded treewidth

Given that most elementary problems in database design are NP-hard, the currently used database design algorithms produce suboptimal results. For example, the current 3NF decomposition algorithms may continue further decomposing a relation even though it is already in 3NF. In this paper we study database design problems whose sets of functional dependencies have bounded treewidth. For such sets, we develop polynomial-time and highly parallelizable algorithms for a number of central database design problems such as:*primality of an attribute; *3NF-test for a relational schema or subschema; *BCNF-test for a subschema. In order to define the treewidth of a relational schema, we shall associate a hypergraph with it. Note that there are two main possibilities of defining the treewidth of a hypergraph H: One is via the primal graph of H and one is via the incidence graph of H. Our algorithms apply to the case where the primal graph is considered. However, we also show that the tractability results still hold when the incidence graph is considered instead. It turns out that our results have interesting applications to logic-based abduction. By the well-known relationship with the primality problem in database design and the relevance problem in propositional abduction, our new algorithms and tractability results can be easily carried over from the former field to the latter. Moreover, we show how these tractability results can be further extended from propositional abduction to abductive diagnosis based on non-ground datalog.

[1]  Hendrik Maryns On the Implementation of Tree Automata: Limitations of the Naive Approach , 2006 .

[2]  Jan Obdrzálek,et al.  DAG-width: connectivity measure for directed graphs , 2006, SODA '06.

[3]  Georg Gottlob,et al.  The Complexity of Logic-Based Abduction , 1993, STACS.

[4]  Markus Lohrey On the Parallel Complexity of Tree Automata , 2001, RTA.

[5]  Allan Borodin,et al.  Two Applications of Inductive Counting for Complementation Problems , 1989, SIAM J. Comput..

[6]  Walter L. Ruzzo,et al.  Tree-size bounded alternation(Extended Abstract) , 1979, J. Comput. Syst. Sci..

[7]  Reinhard Pichler,et al.  Counting Complexity of Propositional Abduction , 2007, IJCAI.

[8]  Raymond Reiter,et al.  Characterizing Diagnoses and Systems , 1992, Artif. Intell..

[9]  Hans L. Bodlaender,et al.  Safe Reduction Rules for Weighted Treewidth , 2002, WG.

[10]  Letizia Tanca,et al.  Logic Programming and Databases , 1990, Surveys in Computer Science.

[11]  Philip A. Bernstein,et al.  Computational problems related to the design of normal form relational schemas , 1979, TODS.

[12]  Georg Gottlob,et al.  Efficient Datalog Abduction through Bounded Treewidth , 2007, AAAI.

[13]  Sylvia L. Osborn Testing for Existence of a Covering Boyce-Codd normal Form , 1979, Inf. Process. Lett..

[14]  José Gabriel Pereira Lopes,et al.  Datalog Grammars for Abductive Syntactic Error Diagnosis and Repair , 1997 .

[15]  W. W. Armstrong,et al.  Dependency Structures of Data Base Relationships , 1974, IFIP Congress.

[16]  Pierangela Samarati,et al.  Regulating service access and information release on the Web , 2000, CCS.

[17]  Hans L. Bodlaender,et al.  A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC.

[18]  Georg Gottlob,et al.  The complexity of XPath query evaluation and XML typing , 2005, JACM.

[19]  David S. Johnson,et al.  A Catalog of Complexity Classes , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[20]  Martin Grohe Descriptive and Parameterized Complexity , 1999, CSL.

[21]  Georg Gottlob,et al.  Computing LOGCFL certificates , 1999, Theor. Comput. Sci..

[22]  Arie M. C. A. Koster,et al.  Treewidth: Computational Experiments , 2001, Electron. Notes Discret. Math..

[23]  Arie M. C. A. Koster,et al.  Safe separators for treewidth , 2006, Discret. Math..

[24]  Martin Grohe,et al.  The complexity of first-order and monadic second-order logic revisited , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.

[25]  Dietmar Berwanger,et al.  Entanglement - A Measure for the Complexity of Directed Graphs with Applications to Logic and Games , 2005, LPAR.

[26]  Fabio Massacci,et al.  An access control framework for business processes for web services , 2003, XMLSEC '03.

[27]  J. H. Jou,et al.  Succinctness in Dependency Systems , 1983, Theor. Comput. Sci..

[28]  Georg Gottlob,et al.  The complexity of acyclic conjunctive queries , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[29]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[30]  Arie M. C. A. Koster,et al.  Combinatorial Optimization on Graphs of Bounded Treewidth , 2008, Comput. J..

[31]  Philip A. Bernstein,et al.  Synthesizing third normal form relations from functional dependencies , 1976, TODS.

[32]  Jörg Flum,et al.  Query evaluation via tree-decompositions , 2001, JACM.

[33]  Marko Samer,et al.  Algorithms for propositional model counting , 2007, J. Discrete Algorithms.

[34]  Bruno Courcelle,et al.  Graph Rewriting: An Algebraic and Logic Approach , 1991, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[35]  Georg Gottlob,et al.  A Comparison of Structural CSP Decomposition Methods , 1999, IJCAI.

[36]  Stephan Kreutzer,et al.  DAG-Width and Parity Games , 2006, STACS.

[37]  Stefan Szeider,et al.  On Fixed-Parameter Tractable Parameterizations of SAT , 2003, SAT.

[38]  Julius T. Tou,et al.  Information Systems , 1973, GI Jahrestagung.

[39]  Hans L. Bodlaender,et al.  A Tourist Guide through Treewidth , 1993, Acta Cybern..

[40]  Georg Gottlob,et al.  Datalog LITE: a deductive query language with linear time model checking , 2002, TOCL.

[41]  K. K. Nambiar,et al.  Boyce-Codd Normal Form Decomposition , 1997 .

[42]  Georg Gottlob,et al.  Hypertree decompositions and tractable queries , 1998, J. Comput. Syst. Sci..

[43]  Jan van Leeuwen,et al.  Handbook of Theoretical Computer Science, Vol. A: Algorithms and Complexity , 1994 .

[44]  Claudio L. Lucchesi,et al.  Candidate Keys for Relations , 1978, J. Comput. Syst. Sci..

[45]  Sheila A. Greibach,et al.  The Hardest Context-Free Language , 1973, SIAM J. Comput..

[46]  Georg Gottlob,et al.  Tractable database design through bounded treewidth , 2006, PODS '06.

[47]  Jan van Leeuwen,et al.  Handbook Of Theoretical Computer Science, Vol. A , 1990 .

[48]  Georg Gottlob,et al.  Bounded treewidth as a key to tractability of knowledge representation and reasoning , 2006, Artif. Intell..

[49]  Catriel Beeri,et al.  Preserving Functional Dependencies , 1981, SIAM J. Comput..

[50]  Michel Minoux,et al.  LTUR: A Simplified Linear-Time Unit Resolution Algorithm for Horn Formulae and Computer Implementation , 1988, Inf. Process. Lett..

[51]  Jean H. Gallier,et al.  Linear-Time Algorithms for Testing the Satisfiability of Propositional Horn Formulae , 1984, J. Log. Program..

[52]  Georg Gottlob,et al.  Hypothesis Classification, Abductive Diagnosis and Therapy , 1990, Expert Systems in Engineering.

[53]  Heikki Mannila,et al.  Design of Relational Databases , 1992 .

[54]  Stephan Kreutzer,et al.  Digraph measures: Kelly decompositions, games, and orderings , 2007, SODA '07.