Representing Meaning with a Combination of Logical Form and Vectors

NLP tasks differ in the semantic information they require, and at this time no single semantic representation fulfills all requirements. Logic-based representations characterize sentence structure, but do not capture the graded aspect of meaning. Distributional models give graded similarity ratings for words and phrases, but do not adequately capture overall sentence structure. So it has been argued that the two are complementary. In this paper, we adopt a hybrid approach that combines logic-based and distributional semantics through probabilistic logic inference in Markov Logic Networks (MLNs). We focus on textual entailment (RTE), a task that can utilize the strengths of both representations. Our system is three components, 1) parsing and task representation, where input RTE problems are represented in probabilistic logic. This is quite different from representing them in standard first-order logic. 2) knowledge base construction in the form of weighted inference rules from different sources like WordNet, paraphrase collections, and lexical and phrasal distributional rules generated on the fly. We use a variant of Robinson resolution to determine the necessary inference rules. More sources can easily be added by mapping them to logical rules; our system learns a resource-specific weight that counteract scaling differences between resources. 3) inference, where we show how to solve the inference problems efficiently. In this paper we focus on the SICK dataset, and we achieve a state-of-the-art result. Our system handles overall sentence structure and phenomena like negation in the logic, then uses our Robinson resolution variant to query distributional systems about words and short phrases. Therefor, we use our system to evaluate distributional lexical entailment approaches. We also publish the set of rules queried from the SICK dataset, which can be a good resource to evaluate them.

[1]  C. R.,et al.  On referring , 1950 .

[2]  J. A. Robinson,et al.  A Machine-Oriented Logic Based on the Resolution Principle , 1965, JACM.

[3]  R. Montague Formal philosophy; selected papers of Richard Montague , 1974 .

[4]  D. Marr,et al.  Artificial Intelligence - A Personal View , 1976, Artif. Intell..

[5]  David R. Dowty,et al.  Introduction to Montague semantics , 1980 .

[6]  László Dezsö,et al.  Universal Grammar , 1981, Certainty in Action.

[7]  Michael R. Genesereth,et al.  Logical foundations of artificial intelligence , 1987 .

[8]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[9]  Terence Parsons,et al.  Events in the Semantics of English: A Study in Subatomic Semantics , 1990 .

[10]  Jerry R. Hobbs,et al.  Interpretation as Abduction , 1993, Artif. Intell..

[11]  Uwe Reyle,et al.  From discourse to logic , 1993 .

[12]  H. Alshawi,et al.  The Core Language Engine , 1994 .

[13]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[14]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[15]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[16]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[17]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[18]  Dan Flickinger,et al.  An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG , 2000, LREC.

[19]  Rina Dechter,et al.  Iterative Join-Graph Propagation , 2002, UAI.

[20]  David J. Weir,et al.  Characterising Measures of Lexical Distributional Similarity , 2004, COLING.

[21]  James R. Curran,et al.  Parsing the WSJ Using CCG and Log-Linear Models , 2004, ACL.

[22]  Ido Dagan,et al.  The Distributional Inclusion Hypotheses and Lexical Entailment , 2005, ACL.

[23]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[24]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[25]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[26]  Francis Jeffry Pelletier,et al.  Representation and Inference for Natural Language: A First Course in Computational Semantics , 2005, Computational Linguistics.

[27]  Matthew Richardson,et al.  The Alchemy System for Statistical Relational AI: User Manual , 2007 .

[28]  Graeme Hirst,et al.  Reconciling fine-grained lexical knowledge and coarse-grained ontologies in the representation of near-synonyms , 2007 .

[29]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[30]  Razvan C. Bunescu,et al.  Multiple instance learning for sparse positive bags , 2007, ICML '07.

[31]  Johan Bos,et al.  Wide-Coverage Semantic Analysis with Boxer , 2008, STEP.

[32]  Iván V. Meza,et al.  Collective Semantic Role Labelling with Markov Logic , 2008, CoNLL.

[33]  Susan Windisch Brown,et al.  Choosing Sense Distinctions for WSD: Psycholinguistic Evidence , 2008, ACL.

[34]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[35]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[36]  Pedro M. Domingos,et al.  Joint Unsupervised Coreference Resolution with Markov Logic , 2008, EMNLP.

[37]  Pedro M. Domingos,et al.  Markov Logic: An Interface Layer for Artificial Intelligence , 2009, Markov Logic: An Interface Layer for Artificial Intelligence.

[38]  Gabriella Vigliocco,et al.  Integrating experiential and distributional data to learn semantic representations. , 2009, Psychological review.

[39]  Christopher D. Manning,et al.  An extended model of natural logic , 2009, IWCS.

[40]  Johan Bos,et al.  Applying automated deduction to natural language understanding , 2009, J. Appl. Log..

[41]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[42]  Jan van Eijck,et al.  Computational Semantics with Functional Programming , 2010 .

[43]  Jun'ichi Tsujii,et al.  A Markov Logic Approach to Bio-Molecular Event Extraction , 2009, BioNLP@HLT-NAACL.

[44]  Stephen Clark,et al.  Mathematical Foundations for a Compositional Distributional Model of Meaning , 2010, ArXiv.

[45]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[46]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[47]  Stefan Thater,et al.  Contextualizing Semantic Representations Using Syntactically Enriched Vector Models , 2010, ACL.

[48]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[49]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[50]  Katrin Erk,et al.  What Is Word Meaning, Really? (And How Can Distributional Models Help Us Describe It?) , 2010 .

[51]  Ido Dagan,et al.  Directional distributional similarity for lexical inference , 2010, Natural Language Engineering.

[52]  Lise Getoor,et al.  Probabilistic Similarity Logic , 2010, UAI.

[53]  M. Wreen EXISTENTIAL IMPORT , 2010 .

[54]  Katrin Erk,et al.  Integrating Logical Representations with Probabilistic Information using Markov Logic , 2011, IWCS.

[55]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[56]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[57]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[58]  Pedro M. Domingos,et al.  Probabilistic theorem proving , 2011, UAI.

[59]  Nicholas Asher,et al.  Lexical Meaning in Context - A Web of Words , 2011 .

[60]  Heiner Stuckenschmidt,et al.  Fine-Grained Sentiment Analysis with Structural Features , 2011, IJCNLP.

[61]  Vibhav Gogate,et al.  SampleSearch: Importance sampling in presence of determinism , 2011, Artif. Intell..

[62]  Alessandro Lenci,et al.  How we BLESSed distributional semantic evaluation , 2011, GEMS.

[63]  Stefan Thater,et al.  A comparison of models of word meaning in context , 2012, HLT-NAACL.

[64]  Raffaella Bernardi,et al.  Entailment above the word level in distributional semantics , 2012, EACL.

[65]  Alessandro Lenci,et al.  Identifying hypernyms in distributional semantic spaces , 2012, *SEMEVAL.

[66]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[67]  James R. Curran,et al.  Dependency Hashing for n-best CCG Parsing , 2012, ACL.

[68]  Carina Silberer,et al.  Grounded Models of Semantic Representation , 2012, EMNLP.

[69]  Lise Getoor,et al.  Hinge-loss Markov Random Fields: Convex Inference for Structured Prediction , 2013, UAI.

[70]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[71]  Mark Steedman,et al.  Combined Distributional and Logical Semantics , 2013, TACL.

[72]  Stephen Clark,et al.  The Frobenius anatomy of word meanings I: subject and object relative pronouns , 2013, J. Log. Comput..

[73]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[74]  Eunsol Choi,et al.  Scaling Semantic Parsers with On-the-Fly Ontology Matching , 2013, EMNLP.

[75]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[76]  Edward Grefenstette,et al.  Towards a Formal Distributional Semantics: Simulating Logical Calculi with Tensors , 2013, *SEMEVAL.

[77]  Cuong Chau,et al.  Montague Meets Markov: Deep Semantics with Probabilistic Logical Form , 2013, *SEMEVAL.

[78]  Raffaella Bernardi,et al.  Sentence paraphrase detection: When determiners and word order make the difference , 2013 .

[79]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[80]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[81]  Katrin Erk,et al.  Probabilistic Soft Logic for Semantic Textual Similarity , 2014, ACL.

[82]  Marco Baroni,et al.  A practical and linguistically-motivated approach to compositional distributional semantics , 2014, ACL.

[83]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[84]  Eric P. Xing,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, ACL 2014.

[85]  Gemma Boleda,et al.  Inclusive yet Selective: Supervised Distributional Hypernymy Detection , 2014, COLING.

[86]  Georgiana Dinu,et al.  How to make words with vectors: Phrase generation in distributional semantics , 2014, ACL.

[87]  van Jan Eijck,et al.  Probabilistic Semantics for Natural Language , 2014 .

[88]  Marco Baroni,et al.  Frege in Space: A Program for Composition Distributional Semantics , 2014, LILT.

[89]  Yusuke Miyao,et al.  Logical Inference on Dependency-based Compositional Semantics , 2014, ACL.

[90]  Wanxiang Che,et al.  Learning Semantic Hierarchies via Word Embeddings , 2014, ACL.

[91]  Mark Steedman,et al.  A* CCG Parsing with a Supertag-factored Model , 2014, EMNLP.

[92]  Raymond J. Mooney,et al.  Efficient Markov Logic Inference for Natural Language Semantics , 2014, StarAI@AAAI.

[93]  Staffan Larsson,et al.  A Probabilistic Rich Type Theory for Semantic Interpretation , 2014, EACL 2014.

[94]  Malvina Nissim,et al.  The Meaning Factory: Formal Semantics for Recognizing Textual Entailment and Determining Semantic Similarity , 2014, *SEMEVAL.

[95]  David J. Weir,et al.  Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.

[96]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[97]  Alice Lai,et al.  Illinois-LH: A Denotational and Distributional Approach to Semantics , 2014, *SEMEVAL.

[98]  Omer Levy,et al.  Do Supervised Distributional Methods Really Learn Lexical Inference Relations? , 2015, NAACL.

[99]  Staffan Larsson,et al.  Probabilistic Type Theory and Natural Language Semantics , 2015, LILT.

[100]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[101]  Katrin Erk,et al.  On the Proper Treatment of Quantifiers in Probabilistic Logic Semantics , 2015, IWCS.

[102]  Percy Liang,et al.  Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[103]  Noah D. Goodman,et al.  Probabilistic Semantics and Pragmatics: Uncertainty in Language and Thought , 2015 .

[104]  Deriving Boolean structures from distributional vectors , 2015, Transactions of the Association for Computational Linguistics.

[105]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[106]  Stephen Clark,et al.  Vector Space Models of Lexical Meaning , 2015 .

[107]  Brian McMahan,et al.  A Bayesian Model of Grounded Color Semantics , 2015, TACL.

[108]  Aurélie Herbelot,et al.  Mr Darcy and Mr Toad, gentlemen: distributional names and their kinds , 2015, IWCS.