Synthesizing Datalog Programs Using Numerical Relaxation

The problem of learning logical rules from examples arises in diverse fields, including program synthesis, logic programming, and machine learning. Existing approaches either involve solving computationally difficult combinatorial problems, or performing parameter estimation in complex statistical models. In this paper, we present Difflog, a technique to extend the logic programming language Datalog to the continuous setting. By attaching real-valued weights to individual rules of a Datalog program, we naturally associate numerical values with individual conclusions of the program. Analogous to the strategy of numerical relaxation in optimization problems, we can now first determine the rule weights which cause the best agreement between the training labels and the induced values of output tuples, and subsequently recover the classical discrete-valued target program from the continuous optimum. We evaluate Difflog on a suite of 34 benchmark problems from recent literature in knowledge discovery, formal verification, and database query-by-example, and demonstrate significant improvements in learning complex programs with recursive rules, invented predicates, and relations of arbitrary arity.

[1]  William Yang Wang,et al.  Structure Learning via Parameter Learning , 2014, CIKM.

[2]  Tim Rocktäschel,et al.  End-to-end Differentiable Proving , 2017, NIPS.

[3]  William W. Cohen,et al.  Polynomial learnability and Inductive Logic Programming: Methods and results , 1995, New Generation Computing.

[4]  Pedro M. Domingos,et al.  Learning the structure of Markov logic networks , 2005, ICML.

[5]  Ion Stoica,et al.  Declarative networking: language, execution and optimization , 2006, SIGMOD Conference.

[6]  Michael I. Jordan,et al.  Learning Programs: A Hierarchical Bayesian Approach , 2010, ICML.

[7]  Aws Albarghouthi,et al.  Constraint-Based Synthesis of Datalog Programs , 2017, CP.

[8]  Richard Evans,et al.  Learning Explanatory Rules from Noisy Data (Extended Abstract) , 2018, IJCAI.

[9]  Jayant R. Haritsa,et al.  Proceedings of the 2019 International Conference on Management of Data , 2019, SIGMOD Conference.

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  Luc De Raedt,et al.  ProbLog: A Probabilistic Prolog and its Application in Link Discovery , 2007, IJCAI.

[12]  Aws Albarghouthi,et al.  Syntax-guided synthesis of Datalog programs , 2018, ESEC/SIGSOFT FSE.

[13]  Christian Bessiere,et al.  Proceedings of the 13th international conference on Principles and practice of constraint programming , 2007 .

[14]  Rajeev Alur,et al.  Syntax-guided synthesis , 2013, 2013 Formal Methods in Computer-Aided Design.

[15]  Fan Yang,et al.  Differentiable Learning of Logical Rules for Knowledge Base Reasoning , 2017, NIPS.

[16]  Luc De Raedt,et al.  An Algebraic Prolog for Reasoning about Possible Worlds , 2011, AAAI.

[17]  T. Tamura,et al.  27th Annual Inter national Conference of the IEEE Engineering in Medicine and Biology Society , 2005 .

[18]  Ehud Y. Shapiro,et al.  Logic Programs With Uncertainties: A Tool for Implementing Rule-Based Systems , 1983, IJCAI.

[19]  Luc De Raedt,et al.  DeepProbLog: Neural Probabilistic Logic Programming , 2018, BNAIC/BENELEARN.

[20]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[21]  Stephen Muggleton,et al.  Meta-interpretive learning of higher-order dyadic datalog: predicate invention revisited , 2013, Machine Learning.

[22]  Yannis Smaragdakis,et al.  Strictly declarative specification of sophisticated points-to analyses , 2009, OOPSLA.