Structure learning in markov logic networks

Markov logic networks (MLNs) [86, 24] are a powerful representation combining first-order logic and probability. An MLN attaches weights to first-order formulas and views these as templates for features of Markov networks. Learning MLN structure consists of learning both formulas and their weights. This is a challenging problem because of its super-exponential search space of formulas, and the need to repeatedly learn the weights of formulas in order to evaluate them, a process that requires computationally expensive statistical inference. This thesis presents a series of algorithms that efficiently and accurately learn MLN structure. We begin by combining ideas from inductive logic programming (ILP) and feature induction in Markov networks in our MSL system. Previous approaches learn MLN structure in a disjoint manner by first learning formulas using off-the-shelf ILP systems and then learning formula weights that optimize some measure of the data's likelihood. We present an integrated approach that learns both formulas and weights to jointly optimize likelihood. Next we present the MRC system that learns latent MLN structure by discovering unary predicates in the form of clusters. MRC forms multiple clusterings of constants and relations, with each cluster corresponding to an invented predicate. We empirically show that by creating multiple clusterings, MRC outperforms previous systems. Then we apply a variant of MRC to the long-standing AI problem of extracting knowledge from text. Our system extracts simple semantic networks in an unsupervised, domain-independent manner from Web text, and introduces several techniques to scale up to the Web. After that, we incorporate the discovery of latent unary predicates into the learning of MLN clauses in the LHL system. LHL first compresses the data into a compact form by clustering the constants into high-level concepts, and then searches for clauses in the compact representation. We empirically show that LHL is more efficient and finds better formulas than previous systems. Finally, we present the LSM system that makes use of random walks to find repeated patterns in data. By restricting its search to within such patterns, LSM is able to accurately and efficiently find good formulas, improving efficiency by 2-5 orders of magnitude compared to previous systems.

[1]  Pedro M. Domingos,et al.  Markov Logic: An Interface Layer for Artificial Intelligence , 2009, Markov Logic: An Interface Layer for Artificial Intelligence.

[2]  Jesse Davis,et al.  Change of Representation for Statistical Relational Learning , 2007, IJCAI.

[3]  Editors , 1986, Brain Research Bulletin.

[4]  Pat Langley,et al.  Improving Efficiency by Learning Intermediate Concepts , 1989, IJCAI.

[5]  Luc Dehaspe Maximum Entropy Modeling with Clausal Constraints , 1997, ILP.

[6]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[7]  Raymond J. Mooney,et al.  Learning for Semantic Parsing , 2009, CICLing.

[8]  Matthew Richardson,et al.  The Alchemy System for Statistical Relational AI: User Manual , 2007 .

[9]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[10]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[11]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[12]  Michael R. Genesereth,et al.  Logical foundations of artificial intelligence , 1987 .

[13]  Matthew Brand,et al.  A Random Walks Perspective on Maximizing Satisfaction and Profit , 2005, SDM.

[14]  Raymond J. Mooney,et al.  Discriminative structure and parameter learning for Markov logic networks , 2008, ICML '08.

[15]  Christopher Meek,et al.  Inference for Multiplicative Models , 2008, UAI.

[16]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[17]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[18]  Ben Taskar,et al.  Introduction to statistical relational learning , 2007 .

[19]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[20]  Hans-Peter Kriegel,et al.  Dirichlet enhanced relational learning , 2005, ICML.

[21]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[22]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[23]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[24]  Oren Etzioni,et al.  A Latent Dirichlet Allocation Method for Selectional Preferences , 2010, ACL.

[25]  R. Schank,et al.  Inside Computer Understanding , 1981 .

[26]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[27]  Ashwin Srinivasan,et al.  Distinguishing Exceptions From Noise in Non-Monotonic Learning , 1992 .

[28]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[29]  Andrew McCallum,et al.  A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models , 2003 .

[30]  Lyle H. Ungar,et al.  Cluster-based concept invention for statistical relational learning , 2004, KDD.

[31]  Stefano Ferilli,et al.  Discriminative Structure Learning of Markov Logic Networks , 2008, ILP.

[32]  Nir Friedman,et al.  Discovering Hidden Variables: A Structure-Based Approach , 2000, NIPS.

[33]  Pedro M. Domingos,et al.  Learning the structure of Markov logic networks , 2005, ICML.

[34]  J. Davenport Editor , 1960 .

[35]  Eugene Charniak,et al.  Toward a model of children's story comprehension , 1972 .

[36]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[37]  Raymond J. Mooney,et al.  Bottom-up learning of Markov logic network structure , 2007, ICML '07.

[38]  Raymond J. Mooney,et al.  Learning Relations by Pathfinding , 1992, AAAI.

[39]  Purnamrita Sarkar,et al.  Fast incremental proximity search in large graphs , 2008, ICML '08.

[40]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[41]  Stephen Muggleton,et al.  Efficient Induction of Logic Programs , 1990, ALT.

[42]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[43]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[44]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[45]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[46]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[47]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[48]  Thomas Hofmann,et al.  Learning annotated hierarchies from relational data , 2007 .

[49]  H. Chertkow,et al.  Semantic memory , 2002, Current neurology and neuroscience reports.

[50]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[51]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[52]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[53]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[54]  M. Dyer In-Depth Understanding: A Computer Model of Integrated Processing for Narrative Comprehension , 1983 .

[55]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[56]  Pedro M. Domingos,et al.  Extracting Semantic Networks from Text Via Relational Clustering , 2008, ECML/PKDD.

[57]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[58]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[59]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[60]  Luc De Raedt,et al.  Integrating Naïve Bayes and FOIL , 2007, J. Mach. Learn. Res..

[61]  Oren Etzioni,et al.  Strategies for lifelong knowledge extraction from the web , 2007, K-CAP '07.

[62]  Oren Etzioni,et al.  Unsupervised Resolution of Objects and Relations on the Web , 2007, NAACL.

[63]  R. Rummel Dimensionality of Nations project: attributes of nations and behavior of nation dyads , 1999 .

[64]  Leo Grady,et al.  Isoperimetric graph partitioning for image segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Luc De Raedt,et al.  Clausal Discovery , 1997, Machine Learning.

[66]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[67]  Pedro M. Domingos,et al.  Using Structural Motifs for Learning Markov Logic Networks , 2010, Statistical Relational Artificial Intelligence.

[68]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[69]  Ah-Hwee Tan,et al.  Mining semantic networks for knowledge discovery , 2003, Third IEEE International Conference on Data Mining.

[70]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[71]  Pedro M. Domingos,et al.  Statistical predicate invention , 2007, ICML '07.

[72]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  M. Degroot,et al.  Probability and Statistics , 1977 .

[74]  Jesse Davis,et al.  An Integrated Approach to Learning Bayesian Networks of Rules , 2005, ECML.

[75]  Raymond J. Mooney,et al.  Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus , 2007, ACL.

[76]  Mark Craven,et al.  Relational Learning with Statistical Predicate Invention: Better Models for Hypertext , 2001, Machine Learning.

[77]  Stefano Ferilli,et al.  Structure Learning of Markov Logic Networks through Iterated Local Search , 2008, ECAI.

[78]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[79]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[80]  Hans-Peter Kriegel,et al.  Infinite Hidden Relational Models , 2006, UAI.

[81]  Nir Friedman,et al.  Learning Hidden Variable Networks: The Information Bottleneck Approach , 2005, J. Mach. Learn. Res..

[82]  Alexa T. McCray,et al.  An Upper-Level Ontology for the Biomedical Domain , 2003, Comparative and functional genomics.

[83]  Pedro M. Domingos,et al.  Lifted First-Order Belief Propagation , 2008, AAAI.

[84]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[85]  Stanley Kok,et al.  Hitting the Right Paraphrases in Good Time , 2010, NAACL.

[86]  Luc De Raedt,et al.  nFOIL: Integrating Naïve Bayes and FOIL , 2005, AAAI.

[87]  Marius Pasca,et al.  Latent Variable Models of Concept-Attribute Attachment , 2009, ACL/IJCNLP.

[88]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[89]  Stephen Muggleton,et al.  Machine Invention of First Order Predicates by Inverting Resolution , 1988, ML.

[90]  Pedro M. Domingos,et al.  Learning Markov logic network structure via hypergraph lifting , 2009, ICML '09.

[91]  Michael J. Pazzani,et al.  Relational Clichés: Constraining Induction During Relational Learning , 1991, ML.

[92]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[93]  Flemming Topsøe,et al.  Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[94]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[95]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[96]  Raymond J. Mooney,et al.  Combining Top-down and Bottom-up Techniques in Inductive Logic Programming , 1994, ICML.

[97]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[98]  Oren Etzioni,et al.  Machine Reading , 2006, AAAI.

[99]  Andrew Y. Ng,et al.  Learning random walk models for inducing word dependency distributions , 2004, ICML.

[100]  David J. Aldous,et al.  Lower bounds for covering times for reversible Markov chains and random walks on graphs , 1989 .

[101]  Wendy Grace Lehnert,et al.  The Process of Question Answering , 2022 .