Bag Query Containment and Information Theory

The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the conjunctive query containment under bag semantics. These connections are established using information inequalities, which are considered to be the laws of information theory. Our first main result asserts that deciding the validity of a generalization of information inequalities is many-one equivalent to the restricted case of conjunctive query containment in which the containing query is acyclic; thus, either both these problems are decidable or both are undecidable. Our second main result identifies a new decidable case of the conjunctive query containment problem under bag semantics. Specifically, we give an exponential time algorithm for conjunctive query containment under bag semantics, provided the containing query is chordal and admits a simple junction tree.

[1]  Benjamin Rossman,et al.  The homomorphism domination exponent , 2010, Eur. J. Comb..

[2]  Dániel Marx,et al.  Size Bounds and Query Plans for Relational Joins , 2013, SIAM J. Comput..

[3]  Zhen Zhang,et al.  On Characterization of Entropy Function via Information Inequalities , 1998, IEEE Trans. Inf. Theory.

[4]  Tony T. Lee,et al.  An Infornation-Theoretic Analysis of Relational Databases—Part I: Data Dependencies and Information Metric , 1987, IEEE Transactions on Software Engineering.

[5]  Georg Gottlob,et al.  Size and treewidth bounds for conjunctive queries , 2009, JACM.

[6]  Terence Chan,et al.  Group characterizable entropy functions , 2007, 2007 IEEE International Symposium on Information Theory.

[7]  Dan Suciu,et al.  Computing Join Queries with Functional Dependencies , 2016, PODS.

[8]  J. Pearl,et al.  Logical and Algorithmic Properties of Conditional Independence and Graphical Models , 1993 .

[9]  Ron van der Meyden The Complexity of Querying Indefinite Data about Linearly Ordered Domains , 1997, J. Comput. Syst. Sci..

[10]  Marcelo Arenas,et al.  An information-theoretic approach to normal forms for relational and XML data , 2003, PODS.

[11]  J. Andrés Montoya,et al.  Defining the almost-entropic regions by algebraic inequalities , 2017, Int. J. Inf. Coding Theory.

[12]  Raymond W. Yeung,et al.  A First Course in Information Theory , 2002 .

[13]  Dan Suciu,et al.  What Do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Have to Do with One Another? , 2016, PODS.

[14]  Surajit Chaudhuri,et al.  Optimization of real conjunctive queries , 1993, PODS '93.

[15]  Dániel Marx,et al.  Constraint solving via fractional edge covers , 2006, SODA '06.

[16]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[17]  Ronald Fagin,et al.  Degrees of acyclicity for hypergraphs and relational database schemes , 1983, JACM.

[18]  Raghu Ramakrishnan,et al.  Containment of conjunctive queries: beyond relations as sets , 1995, TODS.

[19]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[20]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[21]  Fabio Mogavero,et al.  Attacking Diophantus: Solving a Special Case of Bag Containment , 2019, PODS.

[22]  Tony T. Lee An Information-Theoretic Analysis of Relational Databases—Part II: Information Structures of Database Schemas , 1987, IEEE Transactions on Software Engineering.

[23]  Terence Chan Recent Progresses in Characterising Information Inequalities , 2011, Entropy.

[24]  Zhen Zhang,et al.  A non-Shannon-type conditional inequality of information quantities , 1997, IEEE Trans. Inf. Theory.

[25]  Frantisek Matús,et al.  Infinitely Many Information Inequalities , 2007, 2007 IEEE International Symposium on Information Theory.

[26]  Phokion G. Kolaitis,et al.  The containment problem for Real conjunctive queries with inequalities , 2006, PODS '06.

[27]  Manolis Gergatsoulis,et al.  Query containment under bag and bag-set semantics , 2010, Inf. Process. Lett..

[28]  Anthony C. Klug On conjunctive queries containing inequalities , 1988, JACM.

[29]  Raymond W. Yeung,et al.  On a relation between information inequalities and group theory , 2002, IEEE Trans. Inf. Theory.

[30]  Ronald Fagin,et al.  Horn clauses and database dependencies , 1982, JACM.