Evaluating Neural Word Representations in Tensor-Based Compositional Settings

We provide a comparative study between neural word representations and traditional vector spaces based on cooccurrence counts, in a number of compositional tasks. We use three different semantic spaces and implement seven tensor-based compositional models, which we then test (together with simpler additive and multiplicative approaches) in tasks involving verb disambiguation and sentence similarity. To check their scalability, we additionally evaluate the spaces using simple compositional methods on larger-scale tasks with less constrained language: paraphrase detection and dialogue act tagging. In the more constrained tasks, co-occurrence vectors are competitive, although choice of compositional method is important; on the largerscale tasks, they are outperformed by neural word embeddings, which show robust, stable performance across the tasks.

[1]  G. Frege On Sense and Reference , 1948 .

[2]  M. Bunge Sense and reference , 1974 .

[3]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[4]  László Dezsö,et al.  Universal Grammar , 1981, Certainty in Action.

[5]  N. Bourbaki Commutative Algebra: Chapters 1-7 , 1989 .

[6]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Geoffrey Leech,et al.  CLAWS4: The Tagging of the British National Corpus , 1994, COLING.

[8]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[9]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[10]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[11]  Im Stadtwald,et al.  First-Order Inference and the Interpretation of Questions and Answers , 2000 .

[12]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[13]  Yorick Wilks,et al.  Dialogue Act Classification Based on Intra-Utterance Features∗ , 2005 .

[14]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[15]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[16]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[17]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[18]  Johan Bos,et al.  Wide-Coverage Semantic Analysis with Boxer , 2008, STEP.

[19]  Silvia Bernardini,et al.  Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .

[20]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[21]  Stephen Clark,et al.  Mathematical Foundations for a Compositional Distributional Model of Meaning , 2010, ArXiv.

[22]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[23]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[24]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[25]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[26]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[27]  Mehrnoosh Sadrzadeh,et al.  Experimenting with transitive verbs in a DisCoCat , 2011, GEMS.

[28]  Dimitri Kartsaklis,et al.  A Unified Sentence Space for Categorical Distributional-Compositional Semantics: Theory and Experiments , 2012, COLING.

[29]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[30]  Mirella Lapata,et al.  A Comparison of Vector-based Representations for Semantic Composition , 2012, EMNLP.

[31]  Nitin Madnani,et al.  Re-examining Machine Translation Metrics for Paraphrase Identification , 2012, NAACL.

[32]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[33]  Dimitri Kartsaklis,et al.  Prior Disambiguation of Word Tensors for Constructing Sentence Vectors , 2013, EMNLP.

[34]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[35]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[36]  Phil Blunsom,et al.  Recurrent Convolutional Neural Networks for Discourse Compositionality , 2013, CVSM@ACL.

[37]  Stephen Clark,et al.  Improving Distributional Semantic Vectors through Context Selection and Normalisation , 2014, EACL.

[38]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[39]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[40]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[41]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[42]  Stephen Clark,et al.  A Systematic Study of Semantic Vector Space Model Parameters , 2014, CVSC@EACL.

[43]  Dimitri Kartsaklis,et al.  A Study of Entanglement in a Categorical Framework of Natural Language , 2014, QPL.

[44]  Matthew Purver,et al.  Investigating the Contribution of Distributional Semantic Information for Dialogue Act Classification , 2014, CVSC@EACL.