Stochastic Logic Programs

One way to represent a machine learning algorithm's bias over the hypothesis and instance space is as a pair of probability distributions. This approach has been taken both within Bayesian learning schemes and the framework of U-learnability. However, it is not obvious how an Induc-tive Logic Programming (ILP) system should best be provided with a probability distribution. This paper extends the results of a previous paper by the author which introduced stochastic logic programs as a means of providing a structured deenition of such a probability distribution. Stochastic logic programs are a generalisation of stochastic grammars. A stochastic logic program consists of a set of labelled clauses p : C where p is from the interval 0; 1] and C is a range-restricted deenite clause. A stochastic logic program P has a distributional semantics, that is one which assigns a probability distribution to the atoms of each predicate in the Herbrand base of the clauses in P. These probabilities are assigned to atoms according to an SLD-resolution strategy which employs a stochastic selection rule. It is shown that the probabilities can be computed directly for fail-free logic programs and by normalisation for arbitrary logic programs. The stochastic proof strategy can be used to provide three distinct functions: 1) a method of sampling from the Herbrand base which can be used to provide selected targets or example sets for ILP experiments, 2) a measure of the information content of examples or hypotheses; this can be used to guide the search in an ILP system and 3) a simple method for conditioning a given stochastic logic program on samples of data. Functions 1) and 3) are used to measure the generality of hypotheses in the ILP system Progol4.2. This supports an implementation of a Bayesian technique for learning from positive examples only. This paper is an extension of a paper with the same title which appeared in 12]

[1]  H. Jeffreys Logical Foundations of Probability , 1952, Nature.

[2]  J. Lukasiewicz Logical foundations of probability theory , 1970 .

[3]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[4]  William F. Clocksin,et al.  Programming in Prolog , 1987, Springer Berlin Heidelberg.

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  Nils J. Nilsson,et al.  Probabilistic Logic * , 2022 .

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  Ronald Fagin,et al.  Uncertainty, belief, and probability 1 , 1991, IJCAI.

[9]  David Haussler,et al.  Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1991, COLT '91.

[10]  Michael Kearns,et al.  Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[11]  Stephen Muggleton,et al.  Bayesian inductive logic programming , 1994, COLT '94.

[12]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[13]  Stephen Muggleton,et al.  A Learnability Model for Universal Representations and Its Application to Top-down Induction of Decision Trees , 1995, Machine Intelligence 15.

[14]  Taisuke Sato,et al.  A Statistical Learning Method for Logic Programs with Distribution Semantics , 1995, ICLP.

[15]  Stephen Muggleton,et al.  Learning from Positive Data , 1996, Inductive Logic Programming Workshop.

[16]  Thomas Lukasiewicz,et al.  Probabilistic Logic Programming , 1998, ECAI.