Evaluating word embedding models: methods and experimental results

Extensive evaluation on a large number of word embedding models for language processing applications is conducted in this work. First, we introduce popular word embedding models and discuss desired properties of word models and evaluation methods (or evaluators). Then, we categorize evaluators into intrinsic and extrinsic two types. Intrinsic evaluators test the quality of a representation independent of specific natural language processing tasks while extrinsic evaluators use word embeddings as input features to a downstream task and measure changes in performance metrics specific to that task. We report experimental results of intrinsic and extrinsic evaluators on six word embedding models. It is shown that different evaluators focus on different aspects of word models, and some are more correlated with natural language processing tasks. Finally, we adopt correlation analysis to study performance consistency of extrinsic and intrinsic evaluators.

[1]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[8]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[9]  Jun Zhao,et al.  How to Generate a Good Word Embedding , 2015, IEEE Intelligent Systems.

[10]  Alessandro Lenci,et al.  How we BLESSed distributional semantic evaluation , 2011, GEMS.

[11]  Jacob Benesty,et al.  Pearson Correlation Coefficient , 2009 .

[12]  Wlodek Zadrozny,et al.  Measuring Semantic Relatedness using Mined Semantic Analysis , 2015, ArXiv.

[13]  Christophe Gravier,et al.  Dict2vec : Learning Word Embeddings using Lexical Dictionaries , 2017, EMNLP.

[14]  Xiaohua Hu,et al.  Learning the Multilingual Translation Representations for Question Retrieval in Community Question Answering via Non-Negative Matrix Factorization , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[16]  Xiaoyong Du,et al.  Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics , 2017, EMNLP.

[17]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[18]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[19]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[20]  Wanxiang Che,et al.  Joint Optimization for Chinese POS Tagging and Dependency Parsing , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Deyi Xiong,et al.  A Context-Aware Recurrent Encoder for Neural Machine Translation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Abdulrahman Almuhareb,et al.  Attributes in lexical acquisition , 2006 .

[24]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[25]  Yulia Tsvetkov,et al.  Problems With Evaluation of Word Embeddings Using Word Similarity Tasks , 2016, RepEval@ACL.

[26]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[27]  Sampo Pyysalo,et al.  Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance , 2016, RepEval@ACL.

[28]  Thorsten Joachims,et al.  Evaluation methods for unsupervised word embeddings , 2015, EMNLP.

[29]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[30]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[31]  Yuval Merhav,et al.  Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations , 2016, ICLR.

[32]  Anna Korhonen,et al.  An Unsupervised Model for Instance Level Subcategorization Acquisition , 2014, EMNLP.

[33]  Xu Sun,et al.  A Unified Model for Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media , 2017, AAAI.

[34]  Udo Hahn,et al.  Don't Get Fooled by Word Embeddings-Better Watch their Neighborhood , 2017, DH.

[35]  Evgeniy Gabrilovich,et al.  Large-scale learning of word relatedness with constraints , 2012, KDD.

[36]  Mark Sanderson,et al.  Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press 2008. ISBN-13 978-0-521-86571-5, xxi + 482 pages , 2010, Natural Language Engineering.

[37]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[38]  Hua Wu,et al.  An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge , 2017, ACL.

[39]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[40]  Lemao Liu,et al.  A Neural Approach to Source Dependence Based Context Model for Statistical Machine Translation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[41]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[42]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[43]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[44]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[45]  Daisuke Kawahara,et al.  Dependency Parse Reranking with Rich Subtree Features , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[46]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[47]  Shen Li,et al.  Revisiting Correlations between Intrinsic and Extrinsic Evaluations of Word Embeddings , 2018, CCL.

[48]  Guillaume Lample,et al.  Evaluation of Word Vector Representations by Subspace Alignment , 2015, EMNLP.

[49]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[50]  Anna Gladkova,et al.  Intrinsic Evaluations of Word Embeddings: What Can We Do Better? , 2016, RepEval@ACL.

[51]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[52]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[53]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[54]  Roberto Navigli,et al.  Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations , 2016, RepEval@ACL.

[55]  Yong Yu,et al.  Learning Word Representation Considering Proximity and Ambiguity , 2014, AAAI.

[56]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[57]  Hiroyuki Shindo,et al.  Transition-Based Dependency Parsing Exploiting Supertags , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[58]  Xu Sun,et al.  Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[59]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[60]  Bofang Li,et al.  The (too Many) Problems of Analogical Reasoning with Word Vectors , 2017, *SEMEVAL.

[61]  Yue Zhang,et al.  Distributed Feature Representations for Dependency Parsing , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[62]  Vadlamani Ravi,et al.  A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..

[63]  Aykut Koç,et al.  Semantic Structure and Interpretability of Word Embeddings , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[64]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[65]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[66]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[67]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[68]  Amir Bakarov,et al.  A Survey of Word Embeddings Evaluation Methods , 2018, ArXiv.

[69]  Eneko Agirre,et al.  Unsupervised Statistical Machine Translation , 2018, EMNLP.

[70]  C.-C. Jay Kuo,et al.  Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding , 2018, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[71]  Xuejie Zhang,et al.  Refining Word Embeddings Using Intensity Scores for Sentiment Analysis , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[72]  Hinrich Schütze,et al.  Intrinsic Subspace Evaluation of Word Embedding Representations , 2016, ACL.

[73]  Massimo Poesio,et al.  Strudel: A Corpus-Based Semantic Model Based on Properties and Types , 2010, Cogn. Sci..

[74]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .