Separated by an Un-common Language: Towards Judgment Language Informed Vector Space Modeling

A common evaluation practice in the vector space models (VSMs) literature is to measure the models' ability to predict human judgments about lexical semantic relations between word pairs. Most existing evaluation sets, however, consist of scores collected for English word pairs only, ignoring the potential impact of the judgment language in which word pairs are presented on the human scores. In this paper we translate two prominent evaluation sets, wordsim353 (association) and SimLex999 (similarity), from English to Italian, German and Russian and collect scores for each dataset from crowdworkers fluent in its language. Our analysis reveals that human judgments are strongly impacted by the judgment language. Moreover, we show that the predictions of monolingual VSMs do not necessarily best correlate with human judgments made with the language used for model training, suggesting that models and humans are affected differently by the language they use when making semantic judgments. Finally, we show that in a large number of setups, multilingual VSM combination results in improved correlations with human judgments, suggesting that multilingualism may partially compensate for the judgment language effect on human judgments.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[3]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[4]  Raffaella Bernardi,et al.  Entailment above the word level in distributional semantics , 2012, EACL.

[5]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[6]  Felix Hill,et al.  Multi-Modal Models for Concrete and Abstract Concept Meaning , 2014, TACL.

[7]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[8]  Hugo Larochelle,et al.  An Autoencoder Approach to Learning Bilingual Word Representations , 2014, NIPS.

[9]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[10]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[11]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[12]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[15]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[16]  Rada Mihalcea,et al.  Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge , 2009, EMNLP.

[17]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[18]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[19]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[20]  Roy Schwartz,et al.  Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction , 2015, CoNLL.

[21]  Roberto Navigli,et al.  A Unified Multilingual Semantic Representation of Concepts , 2015, ACL.

[22]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[23]  Ralf Steinmetz,et al.  Cross-Lingual Recommendations in a Resource-Based Learning Scenario , 2011, EC-TEL.

[24]  Iryna Gurevych,et al.  Using the Structure of a Conceptual Network in Computing Semantic Relatedness , 2005, IJCNLP.

[25]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[26]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[27]  Phil Blunsom,et al.  Learning Bilingual Word Representations by Marginalizing Alignments , 2014, ACL.

[28]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[29]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[30]  Sabine Schulte im Walde,et al.  Multilingual Reliability and “Semantic” Structure of Continuous Word Spaces , 2015, IWCS.

[31]  Hugo Larochelle,et al.  Learning Multilingual Word Representations using a Bag-of-Words Autoencoder , 2014, ArXiv.

[32]  Iryna Gurevych,et al.  Automatically Creating Datasets for Measures of Semantic Relatedness , 2006, ACL 2006.

[33]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[34]  David M. W. Powers,et al.  Verb similarity on the taxonomy of WordNet , 2006 .

[35]  Guillaume Wenzek,et al.  Trans-gram, Fast Cross-lingual Word-embeddings , 2015, EMNLP.

[36]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[37]  Sarath Chandar Multilingual Deep Learning , 2013 .

[38]  Anna Korhonen,et al.  An Unsupervised Model for Instance Level Subcategorization Acquisition , 2014, EMNLP.

[39]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[40]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[41]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[42]  Stephen Clark,et al.  A Systematic Study of Semantic Vector Space Model Parameters , 2014, CVSC@EACL.

[43]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[44]  Phil Blunsom,et al.  Multilingual Distributed Representations without Word Alignment , 2013, ICLR 2014.

[45]  Carina Silberer,et al.  Grounded Models of Semantic Representation , 2012, EMNLP.

[46]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[47]  Evgeniy Gabrilovich,et al.  Large-scale learning of word relatedness with constraints , 2012, KDD.