Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition

In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.

[1]  Joakim Nivre,et al.  Single Malt or Blended? A Study in Multilingual Parser Optimization , 2007, EMNLP.

[2]  Dekang Lin,et al.  Phrase Clustering for Discriminative Learning , 2009, ACL.

[3]  Nancy Ide,et al.  The American National Corpus First Release , 2004, LREC.

[4]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[5]  Riitta Salmelin,et al.  Tracking neural coding of perceptual and semantic features of concrete nouns , 2012, NeuroImage.

[6]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[7]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[8]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[9]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[10]  Tom M. Mitchell,et al.  Selecting Corpus-Semantic Models for Neurolinguistic Decoding , 2012, *SEMEVAL.

[11]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[12]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[13]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[14]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Peter D. Turney Domain and Function: A Dual-Space Model of Semantic Relations and Compositions , 2012, J. Artif. Intell. Res..

[17]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[18]  Massimo Poesio,et al.  Attribute-Based and Value-Based Clustering: An Evaluation , 2004, EMNLP.

[19]  Mirella Lapata,et al.  A Comparison of Vector-based Representations for Semantic Composition , 2012, EMNLP.

[20]  R. Rapp Word sense discovery based on sense descriptor dissimilarity , 2003, MTSUMMIT.

[21]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[22]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[23]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[24]  Mehrnoosh Sadrzadeh,et al.  A Compositional Distributional Semantics, Two Concrete Constructions, and Some Experimental Evaluations , 2011, QI.

[25]  W. Montague,et al.  Category norms of verbal items in 56 categories A replication and extension of the Connecticut category norms , 1969 .

[26]  Tom M. Mitchell,et al.  Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding , 2012, COLING.

[27]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[28]  Joachim Gross,et al.  Good practice for conducting and reporting MEG research , 2013, NeuroImage.

[29]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[30]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[31]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[32]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[33]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[34]  Tom M. Mitchell,et al.  Quantitative modeling of the neural representation of adjective-noun phrases to account for fMRI activation , 2009, ACL.