Provenance for Probabilistic Logic Programs

Despite the emergence of probabilistic logic programming (PLP) languages for data driven applications, there are currently no debugging tools based on provenance for PLP programs. In this paper, we propose a novel provenance model and system, called P3 (Provenance for Probabilistic logic Programs) for analyzing PLP programs. P3 enables four types of provenance queries: traditional explanation queries, queries for finding the set of most important derivations within an approximate error, top-K most influential queries, and modification queries that enable us to modify tuple probabilities with fewest modifications to program or input data. We apply these queries into real-world scenarios and present theoretical analysis and practical algorithms for such queries. We have developed a prototype of P3, and our evaluation on real-world data demonstrates that the system can support a wide-range of provenance queries with explainable results. Moreover, the system maintains provenance and execute queries efficiently with low overhead.

[1]  Ion Stoica,et al.  Implementing declarative overlays , 2005, SOSP '05.

[2]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB journal.

[3]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[4]  Jean H. Gallier,et al.  Logic for Computer Science: Foundations of Automatic Theorem Proving , 1985 .

[5]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[6]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[7]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[8]  Jian Li,et al.  Sensitivity analysis and explanations for robust query evaluation in probabilistic databases , 2011, SIGMOD '11.

[9]  Dan Suciu,et al.  SlimShot: In-Database Probabilistic Inference for Knowledge Bases , 2016, Proc. VLDB Endow..

[10]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[11]  Christos Faloutsos,et al.  Edge Weight Prediction in Weighted Signed Networks , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[12]  Christos Faloutsos,et al.  REV2: Fraudulent User Prediction in Rating Platforms , 2018, WSDM.

[13]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[14]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[15]  Taisuke Sato,et al.  New Advances in Logic-Based Probabilistic Modeling by PRISM , 2008, Probabilistic Inductive Logic Programming.

[16]  Christopher Ré,et al.  Approximate lineage for probabilistic databases , 2008, Proc. VLDB Endow..

[17]  Chitta Baral,et al.  Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering , 2018, AAAI.

[18]  Christopher Ré,et al.  Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS , 2011, Proc. VLDB Endow..

[19]  David Poole,et al.  The Independent Choice Logic and Beyond , 2008, Probabilistic Inductive Logic Programming.

[20]  S. Muggleton Stochastic Logic Programs , 1996 .

[21]  Stephen H. Bach Hinge-Loss Markov Random Fields and Probabilistic Soft Logic: A Scalable Approach to Structured Prediction , 2015, J. Mach. Learn. Res..

[22]  Ion Stoica,et al.  Declarative routing: extensible routing with declarative queries , 2005, SIGCOMM '05.

[23]  Ion Stoica,et al.  Declarative networking: language, execution and optimization , 2006, SIGMOD Conference.

[24]  Luc De Raedt,et al.  ProbLog: A Probabilistic Prolog and its Application in Link Discovery , 2007, IJCAI.

[25]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[26]  Laurent Vanbever,et al.  Bayonet: probabilistic inference for networks , 2018, PLDI.

[27]  Xiaozhou Li,et al.  Efficient querying and maintenance of network provenance at internet-scale , 2010, SIGMOD Conference.

[28]  Luc De Raedt,et al.  Inference and learning in probabilistic logic programs using weighted Boolean formulas , 2013, Theory and Practice of Logic Programming.

[29]  Taisuke Sato,et al.  A Statistical Learning Method for Logic Programs with Distribution Semantics , 1995, ICLP.

[30]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[31]  Val Tannen,et al.  Provenance semirings , 2007, PODS.