Bag Query Containment and Information Theory

The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the conjunctive query containment under bag semantics. These connections are established using information inequalities, which are considered to be the laws of information theory. Our first main result asserts that deciding the validity of a generalization of information inequalities is many-one equivalent to the restricted case of conjunctive query containment in which the containing query is acyclic; thus, either both these problems are decidable or both are undecidable. Our second main result identifies a new decidable case of the conjunctive query containment problem under bag semantics. Specifically, we give an exponential-time algorithm for conjunctive query containment under bag semantics, provided the containing query is chordal and admits a simple junction tree.

[1]  Dan Suciu,et al.  What Do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Have to Do with One Another? , 2016, PODS.

[2]  Benjamin Rossman,et al.  The homomorphism domination exponent , 2010, Eur. J. Comb..

[3]  Ronald Fagin,et al.  Horn clauses and database dependencies , 1982, JACM.

[4]  Dan Suciu,et al.  Computing Join Queries with Functional Dependencies , 2016, PODS.

[5]  Raymond W. Yeung,et al.  A First Course in Information Theory , 2002 .

[6]  Zhen Zhang,et al.  On Characterization of Entropy Function via Information Inequalities , 1998, IEEE Trans. Inf. Theory.

[7]  Georg Gottlob,et al.  Size and Treewidth Bounds for Conjunctive Queries , 2012 .

[8]  Tony T. Lee,et al.  An Infornation-Theoretic Analysis of Relational Databases—Part I: Data Dependencies and Information Metric , 1987, IEEE Transactions on Software Engineering.

[9]  Fabio Mogavero,et al.  Attacking Diophantus: Solving a Special Case of Bag Containment , 2019, PODS.

[10]  Surajit Chaudhuri,et al.  Optimization of real conjunctive queries , 1993, PODS '93.

[11]  Dániel Marx,et al.  Size Bounds and Query Plans for Relational Joins , 2013, SIAM J. Comput..

[12]  Tony T. Lee An Information-Theoretic Analysis of Relational Databases—Part II: Information Structures of Database Schemas , 1987, IEEE Transactions on Software Engineering.

[13]  Anthony C. Klug On conjunctive queries containing inequalities , 1988, JACM.

[14]  Frantisek Matús,et al.  Infinitely Many Information Inequalities , 2007, 2007 IEEE International Symposium on Information Theory.

[15]  Raymond W. Yeung,et al.  On a relation between information inequalities and group theory , 2002, IEEE Trans. Inf. Theory.

[16]  Phokion G. Kolaitis,et al.  The containment problem for Real conjunctive queries with inequalities , 2006, PODS '06.

[17]  Raghu Ramakrishnan,et al.  Containment of conjunctive queries: beyond relations as sets , 1995, TODS.

[18]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[19]  Ron van der Meyden The Complexity of Querying Indefinite Data about Linearly Ordered Domains , 1997, J. Comput. Syst. Sci..

[20]  Manolis Gergatsoulis,et al.  Query containment under bag and bag-set semantics , 2010, Inf. Process. Lett..

[21]  Ron van der Meyden,et al.  The complexity of querying indefinite data about linearly ordered domains , 1992, J. Comput. Syst. Sci..

[22]  J. Andrés Montoya,et al.  Defining the almost-entropic regions by algebraic inequalities , 2017, Int. J. Inf. Coding Theory.

[23]  J. Pearl,et al.  Logical and Algorithmic Properties of Conditional Independence and Graphical Models , 1993 .

[24]  Dan Suciu,et al.  Bag Query Containment and Information Theory , 2019, ACM Trans. Database Syst..

[25]  Ronald Fagin,et al.  Degrees of acyclicity for hypergraphs and relational database schemes , 1983, JACM.

[26]  Terence Chan Recent Progresses in Characterising Information Inequalities , 2011, Entropy.

[27]  Zhen Zhang,et al.  A non-Shannon-type conditional inequality of information quantities , 1997, IEEE Trans. Inf. Theory.

[28]  Terence Chan,et al.  Group characterizable entropy functions , 2007, 2007 IEEE International Symposium on Information Theory.

[29]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[30]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.