The dichotomy of probabilistic inference for unions of conjunctive queries

We study the complexity of computing a query on a probabilistic database. We consider unions of conjunctive queries, UCQ, which are equivalent to positive, existential First Order Logic sentences, and also to nonrecursive datalog programs. The tuples in the database are independent random events. We prove the following dichotomy theorem. For every UCQ query, either its probability can be computed in polynomial time in the size of the database, or is #P-hard. Our result also has applications to the problem of computing the probability of positive, Boolean expressions, and establishes a dichotomy for such classes based on their structure. For the tractable case, we give a very simple algorithm that alternates between two steps: applying the inclusion/exclusion formula, and removing one existential variable. A key and novel feature of this algorithm is that it avoids computing terms that cancel out in the inclusion/exclusion formula, in other words it only computes those terms whose Mobius function in an appropriate lattice is nonzero. We show that this simple feature is a key ingredient needed to ensure completeness. For the hardness proof, we give a reduction from the counting problem for positive, partitioned 2CNF, which is known to be #P-complete. The hardness proof is nontrivial, and combines techniques from logic, classical algebra, and analysis.

[1]  Pierre Marquis,et al.  A Knowledge Compilation Map , 2002, J. Artif. Intell. Res..

[2]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[3]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[4]  J. Scott Provan,et al.  The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected , 1983, SIAM J. Comput..

[5]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[6]  Jörg Hoffmann,et al.  Short XORs for Model Counting: From Theory to Practice , 2007, SAT.

[7]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[8]  Dan Olteanu,et al.  Secondary-storage confidence computation for conjunctive queries with inequalities , 2009, SIGMOD Conference.

[9]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[10]  Nadia Creignou,et al.  Complexity of Generalized Satisfiability Counting Problems , 1996, Inf. Comput..

[11]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[12]  Leonid Libkin,et al.  Elements of Finite Model Theory , 2004, Texts in Theoretical Computer Science.

[13]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[14]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[15]  Leonid Libkin,et al.  Elements Of Finite Model Theory (Texts in Theoretical Computer Science. An Eatcs Series) , 2004 .

[16]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[17]  Udi Rotics,et al.  Factoring and recognition of read-once functions using cographs and normality and the readability of functions associated with partial k-trees , 2006, Discret. Appl. Math..

[18]  Charalambos A. Charalambides,et al.  Enumerative combinatorics , 2018, SIGA.

[19]  Ingo Wegener,et al.  BDDs--design, analysis, complexity, and applications , 2004, Discret. Appl. Math..

[20]  Dan Olteanu,et al.  SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[21]  Kevin H. Knuth,et al.  Lattice duality: The origin of probability and entropy , 2013, Neurocomputing.

[22]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[23]  Pedro M. Domingos,et al.  Markov Logic: An Interface Layer for Artificial Intelligence , 2009, Markov Logic: An Interface Layer for Artificial Intelligence.

[24]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[25]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[26]  Thore Husfeldt,et al.  The Exponential Time Complexity of Computing the Probability That a Graph Is Connected , 2010, IPEC.

[27]  Adnan Darwiche On the tractable counting of theory models and its application to belief revision and truth maintenance , 2000, ArXiv.

[28]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS.

[29]  Alfred V. Aho,et al.  Equivalences Among Relational Expressions , 1979, SIAM J. Comput..

[30]  Dan Suciu,et al.  Probabilistic databases , 2011, SIGA.

[31]  I. Wegener Branching Programs and Binary Deci-sion Diagrams-Theory and Applications , 1987 .

[32]  Dan Suciu,et al.  Computing query probability with incidence algebras , 2010, PODS '10.

[33]  Dan Roth,et al.  Lifted First-Order Probabilistic Inference , 2005, IJCAI.

[34]  Jennifer Widom,et al.  Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[35]  Yuri Gurevich,et al.  The complexity of query reliability , 1998, PODS.

[36]  Pedro M. Domingos,et al.  Unsupervised Ontology Induction from Text , 2010, ACL.

[37]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[38]  C. Krattenthaler ADVANCED DETERMINANT CALCULUS , 1999, math/9902004.

[39]  Bart Selman,et al.  Model Counting , 2021, Handbook of Satisfiability.

[40]  SuciuDan,et al.  The dichotomy of probabilistic inference for unions of conjunctive queries , 2013 .

[41]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..