Parameter Learning in Probabilistic Databases: A Least Squares Approach

We introduce the problem of learning the parameters of the probabilistic database ProbLog. Given the observed success probabilities of a set of queries, we compute the probabilities attached to facts that have a low approximation error on the training examples as well as on unseen examples. Assuming Gaussian error terms on the observed success probabilities, this naturally leads to a least squares optimization problem. Our approach, called LeProbLog, is able to learn both from queries and from proofs and even from both simultaneously. This makes it flexible and allows faster training in domains where the proofs are available. Experiments on real world data show the usefulness and effectiveness of this least squares calibration of probabilistic databases.

[1]  David Poole,et al.  Logic programming, abduction and probability , 1993, New Generation Computing.

[2]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[3]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief NetworksMean Field Theory for Sigmoid Belief , 1996 .

[4]  James Cussens,et al.  Parameter Estimation in Stochastic Logic Programs , 2001, Machine Learning.

[5]  Norbert Fuhr,et al.  Learning probabilistic datalog rules for information classification and transformation , 2001, CIKM '01.

[6]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[7]  Luc De Raedt,et al.  Probabilistic logic learning , 2003, SKDD.

[8]  Lyle H. Ungar,et al.  Statistical Relational Learning at U Penn , 2003 .

[9]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[10]  Luc De Raedt,et al.  Probabilistic Explanation Based Learning , 2007, ECML.

[11]  J. W. Lloyd,et al.  Foundations of logic programming; (2nd extended ed.) , 1987 .

[12]  Luc De Raedt,et al.  Compressing probabilistic Prolog programs , 2007, Machine Learning.

[13]  Luc De Raedt,et al.  Probabilistic Inductive Logic Programming - Theory and Applications , 2008, Probabilistic Inductive Logic Programming.

[14]  Stephen Muggleton,et al.  Learning Probabilistic Logic Models from Probabilistic Examples (Extended Abstract) , 2007, ILP.

[15]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[16]  Luc De Raedt,et al.  Probabilistic inductive logic programming , 2004 .

[17]  Hannu Toivonen,et al.  Link Discovery in Graphs Derived from Biological Databases , 2006, DILS.

[18]  Stephen Muggleton,et al.  Learning probabilistic logic models from probabilistic examples , 2007, Machine Learning.

[19]  Yoshitaka Kameya,et al.  Parameter Learning of Logic Programs for Symbolic-Statistical Modeling , 2001, J. Artif. Intell. Res..

[20]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[21]  Luc De Raedt,et al.  Towards Learning Stochastic Logic Programs from Proof-Banks , 2005, AAAI.

[22]  Luc De Raedt,et al.  Basic Principles of Learning Bayesian Logic Programs , 2008, Probabilistic Inductive Logic Programming.

[23]  John Wylie Lloyd,et al.  Foundations of Logic Programming , 1987, Symbolic Computation.

[24]  Luc De Raedt,et al.  ProbLog: A Probabilistic Prolog and its Application in Link Discovery , 2007, IJCAI.

[25]  Stefan Wrobel,et al.  Extensibility in Data Mining Systems , 1996, KDD.

[26]  Rahul Gupta,et al.  Creating probabilistic databases from information extraction models , 2006, VLDB.