Markov logic for machine reading

A long-standing goal of AI and natural language processing (NLP) is to harness human knowledge by automatically understanding text. Known as machine reading, it has become increasingly urgent with the rise of billions of web documents. However, progress in machine reading has been difficult, due to the combination of several key challenges: the complexity and uncertainty in representing and reasoning with knowledge, and the prohibitive cost in providing direct supervision (e.g., designing the meaning representation and labeling examples) for training a machine reading system. In this dissertation, I propose a unifying approach for machine reading based on Markov logic. Markov logic defines a probabilistic model by weighted first-order logical formulas. It provides an ideal language for representing and reasoning with complex, probabilistic knowledge, and opens up new avenues for leveraging indirect supervision via joint inference, where the labels of some objects can be used to predict the labels of others. I will demonstrate the promise of this approach by presenting a series of works that applied Markov logic to increasingly challenging problems in machine reading. First, I will describe a joint approach for citation information extraction that combines information among different citations and processing stages. Using Markov logic as a representation language and the generic learning and inference algorithms available for it, our solution largely reduced to writing appropriate logical formulas and was able to achieve state-of-the-art accuracy with substantially less engineering effort compared to previous approaches. Next, I will describe an unsupervised coreference resolution system that builds on Markov logic to incorporate prior knowledge and conduct large-scale joint inference. This helps compensate for the lack of labeled examples, and our unsupervised system often ties or even outperforms previous state-of-the-art supervised systems. Finally, I will describe the USP system, the first unsupervised approach for jointly inducing a meaning representation and extracting detailed meanings from text. To resolve linguistic variations for the same meaning, USP recursively clusters expressions that are composed with or by similar expressions. USP can also induce ontological relations by creating abstractions to assimilate commonalities among non-synonymous meaning clusters. This results in a state-of-the-art end-to-end machine reading system that can read text, extract knowledge and answer questions, all without any labeled examples. Markov logic provides an extremely compact representation of the USP model, and enables future work to "close the loop" by incorporating the extracted knowledge into the model to aid further extraction.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Pedro M. Domingos,et al.  Learning Markov Logic Networks Using Structural Motifs , 2010, ICML.

[3]  Pascale Fung,et al.  Lexical Semantics for Statistical Machine Translation , 2011 .

[4]  Slav Petrov,et al.  Coarse-to-Fine Natural Language Processing , 2011, Theory and Applications of Natural Language Processing.

[5]  Andrew McCallum,et al.  First-Order Probabilistic Models for Coreference Resolution , 2007, NAACL.

[6]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[7]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[8]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[9]  Pedro M. Domingos,et al.  Leveraging Ontologies for Lifted Probabilistic Inference and Learning , 2010, StarAI@AAAI.

[10]  Pekka Kilpeläinen,et al.  Tree Matching Problems with Applications to Structured Text Databases , 2022 .

[11]  Bart Selman,et al.  Local search strategies for satisfiability testing , 1993, Cliques, Coloring, and Satisfiability.

[12]  Michael R. Genesereth,et al.  Logical foundations of artificial intelligence , 1987 .

[13]  Pedro M. Domingos,et al.  Unsupervised Ontology Induction from Text , 2010, ACL.

[14]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[15]  Pedro M. Domingos,et al.  Discriminative Training of Markov Logic Networks , 2005, AAAI.

[16]  Pedro M. Domingos,et al.  Extracting Semantic Networks from Text Via Relational Clustering , 2008, ECML/PKDD.

[17]  Xavier Carreras,et al.  Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling , 2004, CoNLL.

[18]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[19]  Ming-Wei Chang,et al.  Driving Semantic Parsing from the World’s Response , 2010, CoNLL.

[20]  Liang Huang,et al.  Forest-based algorithms in natural language processing , 2008 .

[21]  Pedro M. Domingos,et al.  Efficient Weight Learning for Markov Logic Networks , 2007, PKDD.

[22]  Hiyan Alshawi,et al.  Resolving Quasi Logical Forms , 1990, CL.

[23]  Graeme Hirst,et al.  Computing Word-Pair Antonymy , 2008, EMNLP.

[24]  Razvan C. Bunescu,et al.  Collective Information Extraction with Relational Markov Networks , 2004, ACL.

[25]  Pedro M. Domingos,et al.  Markov Logic: An Interface Layer for Artificial Intelligence , 2009, Markov Logic: An Interface Layer for Artificial Intelligence.

[26]  C. Lee Giles,et al.  Autonomous citation matching , 1999, AGENTS '99.

[27]  Thomas G. Dietterich,et al.  Learning Rules from Incomplete Examples via a Probabilistic Mention Model , 2011 .

[28]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[29]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[30]  Xiaoqiang Luo,et al.  A Mention-Synchronous Coreference Resolution Algorithm Based On the Bell Tree , 2004, ACL.

[31]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[32]  Oren Etzioni,et al.  Scaling Textual Inference to the Web , 2008, EMNLP.

[33]  Pedro M. Domingos,et al.  Statistical predicate invention , 2007, ICML '07.

[34]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[35]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[36]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[37]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[38]  Oren Etzioni,et al.  Machine Reading , 2006, AAAI.

[39]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[40]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[41]  Andrew McCallum,et al.  Conditional Models of Identity Uncertainty with Application to Noun Coreference , 2004, NIPS.

[42]  Raymond J. Mooney,et al.  Bottom-up learning of Markov logic network structure , 2007, ICML '07.

[43]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[44]  Pedro M. Domingos,et al.  Memory-Efficient Inference in Relational Domains , 2006, AAAI.

[45]  Raymond J. Mooney,et al.  Learning for Semantic Parsing , 2009, CICLing.

[46]  Bart Selman,et al.  A general stochastic approach to solving problems with hard and soft constraints , 1996, Satisfiability Problem: Theory and Applications.

[47]  Luc De Raedt,et al.  Towards Combining Inductive Logic Programming with Bayesian Networks , 2001, ILP.

[48]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[49]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[50]  Gerhard Weikum,et al.  SOFIE: a self-organizing framework for information extraction , 2009, WWW '09.

[51]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[52]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[53]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[54]  David A. McAllester,et al.  The Generalized A* Architecture , 2007, J. Artif. Intell. Res..

[55]  Hoifung Poon,et al.  Unsupervised Morphological Segmentation with Log-Linear Models , 2009, NAACL.

[56]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[57]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[58]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[59]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[60]  Stephen Muggleton,et al.  Predicate invention and utilization , 1994, J. Exp. Theor. Artif. Intell..

[61]  Luke S. Zettlemoyer,et al.  Bootstrapping Semantic Parsers from Conversations , 2011, EMNLP.

[62]  Miroslav Dudík,et al.  Hierarchical maximum entropy density estimation , 2007, ICML '07.

[63]  Jun'ichi Tsujii Thesaurus or Logical Ontology, Which do we Need for Mining Text? , 2004, LREC.

[64]  Robert P. Goldman,et al.  From knowledge bases to decision models , 1992, The Knowledge Engineering Review.

[65]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[66]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[67]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[68]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[69]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[70]  Raymond J. Mooney,et al.  Combining Top-down and Bottom-up Techniques in Inductive Logic Programming , 1994, ICML.

[71]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[72]  Matthew Richardson,et al.  Building large knowledge bases by mass collaboration , 2003, K-CAP '03.

[73]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[74]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[75]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[76]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[77]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[78]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[79]  Dan Klein,et al.  Unsupervised Coreference Resolution in a Nonparametric Bayesian Model , 2007, ACL.

[80]  Pedro M. Domingos,et al.  Learning Markov logic network structure via hypergraph lifting , 2009, ICML '09.

[81]  Vincent Ng,et al.  Supervised Noun Phrase Coreference Research: The First Fifteen Years , 2010, ACL.

[82]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[83]  Hoifung Poon,et al.  Unsupervised Semantic Parsing , 2009, EMNLP.

[84]  John DeNero,et al.  Better Word Alignments with Supervised ITG Models , 2009, ACL.

[85]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[86]  Anna Maria Di Sciullo,et al.  Natural Language Understanding , 2009, SoMeT.

[87]  Daniel S. Weld,et al.  Automatically refining the wikipedia infobox ontology , 2008, WWW.

[88]  Pedro M. Domingos,et al.  Markov Logic in Infinite Domains , 2007, UAI.

[89]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[90]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[91]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[92]  Nicholas Kushmerick,et al.  Wrapper induction: Efficiency and expressiveness , 2000, Artif. Intell..

[93]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[94]  Pascal Denis,et al.  Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming , 2007, NAACL.

[95]  Andrew McCallum,et al.  An Integrated, Conditional Model of Information Extraction and Coreference with Appli , 2004, UAI.

[96]  Pedro M. Domingos,et al.  Machine Reading: A "Killer App" for Statistical Relational AI , 2010, StarAI@AAAI.

[97]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[98]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[99]  Dan Klein,et al.  Unsupervised Learning of Field Segmentation Models for Information Extraction , 2005, ACL.

[100]  Dan Roth,et al.  Confidence Driven Unsupervised Semantic Parsing , 2011, ACL.

[101]  Raymond J. Mooney,et al.  Learning a Compositional Semantic Parser using an Existing Syntactic Parser , 2009, ACL.

[102]  Pedro M. Domingos,et al.  Learning the structure of Markov logic networks , 2005, ICML.

[103]  Oren Etzioni,et al.  Unsupervised Methods for Determining Object and Relation Synonyms on the Web , 2014, J. Artif. Intell. Res..

[104]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[105]  Matthew Richardson,et al.  The Alchemy System for Statistical Relational AI: User Manual , 2007 .

[106]  Peter Haddawy,et al.  Answering Queries from Context-Sensitive Probabilistic Knowledge Bases , 1997, Theor. Comput. Sci..

[107]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[108]  Ivan Titov,et al.  A Bayesian Model for Unsupervised Semantic Parsing , 2011, ACL.

[109]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[110]  Dan Klein,et al.  Analyzing the Errors of Unsupervised Learning , 2008, ACL.

[111]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[112]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[113]  Heiner Stuckenschmidt,et al.  Handbook on Ontologies , 2004, Künstliche Intell..

[114]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[115]  Bart Selman,et al.  Towards Efficient Sampling: Exploiting Random Walk Strategies , 2004, AAAI.

[116]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[117]  James A. Evans,et al.  Machine Science , 2010, Science.

[118]  Pedro M. Domingos,et al.  Joint Unsupervised Coreference Resolution with Markov Logic , 2008, EMNLP.

[119]  Ramin Zabih,et al.  Dynamic Programming and Graph Algorithms in Computer Vision , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[120]  Vincent Ng,et al.  Machine Learning for Coreference Resolution: From Local Classification to Global Ranking , 2005, ACL.