Maximum Entropy Modeling with Clausal Constraints

We present the learning system Maccent which addresses the novel task of stochastic MAximum ENTropy modeling with Clausal Constraints. Maximum Entropy method is a Bayesian method based on the principle that the target stochastic model should be as uniform as possible, subject to known constraints. Maccent incorporates clausal constraints that are based on the evaluation of Prolog clauses in examples represented as Prolog programs. We build on an existing maximum-likelihood approach to maximum entropy modeling, which we upgrade along two dimensions: (1) Maccent can handle larger search spaces, due to a partial ordering defined on the space of clausal constraints, and (2) uses a richer first-order logic format. In comparison with other inductive logic programming systems, MACCENT seems to be the first that explicitly constructs a conditional probability distribution p(C|I) based on an empirical distribution \(\tilde p\)(C|I) (where p(C|I) (\(\tilde p\)(C|I)) equals the induced (observed) probability of an instance I belonging to a class C). First experiments indicate MACCENT may be useful for prediction, and for classification in cases where the induced model should be combined with other stochastic information sources.

[1]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[2]  E. T. Jaynes,et al.  Probability Theory as Logic , 1990 .

[3]  S. Gull,et al.  Image reconstruction from incomplete and noisy data , 1978, Nature.

[4]  Luc De Raedt,et al.  Using Logical Decision Trees for Clustering , 1997, ILP.

[5]  Stefan Kramer,et al.  Structural Regression Trees , 1996, AAAI/IAAI, Vol. 1.

[6]  Luc De Raedt,et al.  Inductive Constraint Logic , 1995, ALT.

[7]  Ashwin Srinivasan,et al.  Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[8]  E. Jaynes,et al.  NOTES ON PRESENT STATUS AND FUTURE PROSPECTS , 1991 .

[9]  David T. Brown,et al.  A Note on Approximations to Discrete Probability Distributions , 1959, Inf. Control..

[10]  S. Muggleton Stochastic Logic Programs , 1996 .

[11]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[12]  Luc De Raedt,et al.  Induction in logic , 1996 .

[13]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Luc De Raedt,et al.  First-Order jk-Clausal Theories are PAC-Learnable , 1994, Artif. Intell..

[15]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[16]  Ivan Bratko,et al.  Prolog Programming for Artificial Intelligence , 1986 .

[17]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[18]  Igor Kononenko,et al.  Naive Bayesian classifier within ILP-R , 1995 .

[19]  Luc De Raedt,et al.  DLAB: A Declarative Language Bias Formalism , 1996, ISMIS.

[20]  Luc De Raedt,et al.  Top-down induction of logical decision trees , 1997 .

[21]  Igor Kononenko,et al.  Probabilistic First-Order Classification , 1997, ILP.