Prior Disambiguation of Word Tensors for Constructing Sentence Vectors

Recent work has shown that compositionaldistributional models using element-wise operations on contextual word vectors benefit from the introduction of a prior disambiguation step. The purpose of this paper is to generalise these ideas to tensor-based models, where relational words such as verbs and adjectives are represented by linear maps (higher order tensors) acting on a number of arguments (vectors). We propose disambiguation algorithms for a number of tensor-based models, which we then test on a variety of tasks. The results show that disambiguation can provide better compositional representation even for the case of tensor-based models. Furthermore, we confirm previous findings regarding the positive effect of disambiguation on vector mixture models, and we compare the effectiveness of the two approaches.

[1]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[2]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[3]  N. Bourbaki Commutative Algebra: Chapters 1-7 , 1989 .

[4]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[5]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[6]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[7]  Walter Kintsch,et al.  Predication , 2001, Cogn. Sci..

[8]  M. Pickering,et al.  Processing ambiguous verbs: evidence from eye movements. , 2001, Journal of experimental psychology. Learning, memory, and cognition.

[9]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[10]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[11]  Silvia Bernardini,et al.  Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .

[12]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[13]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[14]  Suresh Manandhar,et al.  SemEval-2010 Task 14: Word Sense Induction &Disambiguation , 2010, SemEval@ACL.

[15]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[16]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[17]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[18]  Mehrnoosh Sadrzadeh,et al.  Experimenting with transitive verbs in a DisCoCat , 2011, GEMS.

[19]  Suresh Manandhar,et al.  Dynamic and Static Prototype Vectors for Semantic Composition , 2011, IJCNLP.

[20]  Dimitri Kartsaklis,et al.  A Unified Sentence Space for Categorical Distributional-Compositional Semantics: Theory and Experiments , 2012, COLING.

[21]  Mirella Lapata,et al.  A Comparison of Vector-based Representations for Semantic Composition , 2012, EMNLP.

[22]  Daniel Müllner,et al.  fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python , 2013 .

[23]  Mehrnoosh Sadrzadeh,et al.  Quantum Physics and Linguistics - A Compositional, Diagrammatic Discourse , 2013, Quantum Physics and Linguistics.

[24]  Dimitri Kartsaklis,et al.  Separating Disambiguation from Composition in Distributional Semantics , 2013, CoNLL.

[25]  Mehrnoosh Sadrzadeh,et al.  Multi-Step Regression Learning for Compositional Distributional Semantics , 2013, IWCS.

[26]  Daniel Müllner Fast Hierarchical Clustering Routines for R and Python , 2015 .