Vector Space Models of Lexical Meaning

Much of this Handbook is based on ideas from Formal Semantics, in which the meanings of phrases or sentences are represented in terms of set-theoretic models. The key intuition behind Formal Semantics, very roughly, is that the world is full of objects; objects have properties; and relations hold between objects. Set-theoretic models are ideal for capturing this intuition, and have been succcessful at providing formal descriptions of key elements of natural language semantics, for example quantification. 1 This approach has also proven attractive for Computational Semantics – the discipline concerned with representing, and reasoning with, the meanings of natural language utterances using a computer. One reason is that the formalisms used in the set-theoretic approaches, e.g. first-order predicate calculus, have well-defined inference mechanisms which can be implemented on a computer (Blackburn & Bos, 2005). The approach to natural language semantics taken in this chapter will be rather different, and will use a different branch of mathematics to the set theory employed in most studies in Formal Semantics, namely the mathematical framework of vector spaces and linear algebra. The attraction of using vector spaces is that they provide a natural mechanism for talking about distance and similarity, concepts from geometry. Why should a geometric approach to modelling natural language semantics be appropriate? There are many aspects of semantics, particularly lexical semantics, which require a notion of distance. For example, the meaning of the word cat is closer to the meaning of the word dog than the meaning of the word car. The modelling of such distances is now commonplace in Computational Linguistics, since many examples of language technology benefit from knowing how word meanings are related geometrically; for example, a search engine could expand the range of web pages being returned for a set of query terms by considering additional terms which are close in meaning to those in the query. The meanings of words have largely been neglected in Formal Semantics, typically being represented as atomic entities such as dog , whose interpretation is to denote some object (or set of objects) in a set-theoretic model. In this chapter the meanings of words will be represented using vectors, as part of a high-dimensional " semantic space ". The fine-grained structure of this space is provided by considering the contexts in which words occur in large corpora of text. Words can easily be compared for similarity in the vector space, using …

[1]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[2]  Michael Moortgat,et al.  Categorial Type Logics , 1997, Handbook of Logic and Language.

[3]  Daoud Clarke Context-theoretic Semantics for Natural Language: an Overview , 2009 .

[4]  Ted Briscoe,et al.  Evaluating the Accuracy of an Unlexicalized Statistical Parser on the PARC DepBank , 2006, ACL.

[5]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[6]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[7]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[8]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[9]  Joachim Lambek,et al.  An Algebraic Approach to French Sentence Structure , 2001, LACL.

[10]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[11]  Emiliano Raúl Guevara,et al.  Computing Semantic Compositionality in Distributional Semantics , 2011, IWCS.

[12]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[13]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[14]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[15]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[16]  Ioannis Korkontzelos,et al.  Estimating Linear Models for Compositional Distributional Semantics , 2010, COLING.

[17]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[18]  M. Steedman,et al.  Combinatory Categorial Grammar , 2011 .

[19]  Hinrich Sch Automatic Word Sense Discrimination , 1998 .

[20]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[21]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[22]  Robert L. Goldstone,et al.  Similarity Involving Attributes and Relations: Judgments of Similarity and Difference Are Not Inverses , 1990 .

[23]  Mirella Lapata,et al.  Language Models Based on Semantic Composition , 2009, EMNLP.

[24]  Michael Moortgat Categorial Type Logics , 1997, Handbook of Logic and Language.

[25]  Dominic Widdows,et al.  Semantic Vector Products: Some Initial Investigations , 2008 .

[26]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[27]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[28]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[29]  Mehrnoosh Sadrzadeh Bell States as Negation in Natural Languages , 2009 .

[30]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[31]  H. Alshawi,et al.  The Core Language Engine , 1994 .

[32]  K. Markert,et al.  When logical inference helps determining textual entailment ( and when it doesn ’ t ) , .

[33]  Stephen Clark,et al.  Concrete Sentence Spaces for Compositional Distributional Models of Meaning , 2010, IWCS.

[34]  T. V. D. Cruys A Non-negative Tensor Factorization Model for Selectional Preference Induction , 2009 .

[35]  Ann A. Copestake,et al.  Invited Talk: Slacker Semantics: Why Superficiality, Dependency and Avoidance of Commitment can be the Right Way to Go , 2009, EACL.

[36]  Francis Jeffry Pelletier,et al.  Representation and Inference for Natural Language: A First Course in Computational Semantics , 2005, Computational Linguistics.

[37]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[38]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[39]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[40]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[41]  James M. Hodgson Informational constraints on pre-lexical priming , 1991 .

[42]  David J. Weir,et al.  Characterizing mildly context-sensitive grammar formalisms , 1988 .

[43]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[44]  Stephen Clark,et al.  A Type-Driven Tensor-Based Semantics for CCG , 2014, EACL 2014.

[45]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[46]  W. Lowe,et al.  The Direct Route: Mediated Priming in Semantic Space , 2000 .

[47]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[48]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[49]  Dominic Widdows,et al.  Geometry and Meaning , 2004, Computational Linguistics.

[50]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[51]  Stefan Evert Distributional Semantic Models , 2010, NAACL.

[52]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[53]  David R. Dowty,et al.  Introduction to Montague semantics , 1980 .

[54]  Edward Grefenstette,et al.  Category-theoretic quantitative compositional distributional models of natural language semantics , 2013, ArXiv.

[55]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.

[56]  Pavel Blagoveston Bochev,et al.  A vector space model for information retrieval with generalized similarity measures. , 2012 .

[57]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[58]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[59]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[60]  J. Lambek The Mathematics of Sentence Structure , 1958 .

[61]  T. A. Fowler,et al.  Book Review: Categorial Grammar: Logical Syntax, Semantics, and Processing by Glyn V. Morrill , 2010, International Conference on Computational Logic.

[62]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[63]  Carl Vogel,et al.  Proceedings of the 16th International Conference on Computational Linguistics , 1996, COLING 1996.

[64]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[65]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[66]  James R. Curran,et al.  Investigating GIS and Smoothing for Maximum Entropy Taggers , 2003, EACL.

[67]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[68]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[69]  Stephen Clark,et al.  Mathematical Foundations for a Compositional Distributional Model of Meaning , 2010, ArXiv.

[70]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[71]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[72]  Massimo Poesio,et al.  Strudel: A Corpus-Based Semantic Model Based on Properties and Types , 2010, Cogn. Sci..

[73]  Jason Baldridge,et al.  Non-Transformational Syntax: Formal and Explicit Models of Grammar , 2011 .

[74]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[75]  M. Engelmann The Philosophical Investigations , 2013 .

[76]  Daoud Clarke,et al.  A Context-Theoretic Framework for Compositionality in Distributional Semantics , 2011, Computational Linguistics.

[77]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[78]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[79]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[80]  Tim Van de Cruys,et al.  A non-negative tensor factorization model for selectional preference induction , 2009, Natural Language Engineering.

[81]  Markus Werning,et al.  The Oxford Handbook of Compositionality , 2012 .

[82]  Peter D. Turney A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations , 2008, COLING.