Comparative Analysis of Word Embeddings for Capturing Word Similarities

Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.

[1]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[2]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[3]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[4]  Cícero Nogueira dos Santos,et al.  Boosting Named Entity Recognition with Neural Character Embeddings , 2015, NEWS@ACL.

[5]  Catherine Havasi,et al.  Representing General Relational Knowledge in ConceptNet 5 , 2012, LREC.

[6]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[7]  Hongfang Liu,et al.  A Comparison of Word Embeddings for the Biomedical Natural Language Processing , 2018, J. Biomed. Informatics.

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Steven Bethard,et al.  A Survey on Recent Advances in Named Entity Recognition from Deep Learning models , 2018, COLING.

[10]  Holger Schwenk,et al.  CSLM - a modular open-source continuous space language modeling toolkit , 2013, INTERSPEECH.

[11]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[12]  Kang Liu,et al.  Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu , 2015, CL.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Matthijs Douze,et al.  FastText.zip: Compressing text classification models , 2016, ArXiv.

[15]  Marco Idiart,et al.  Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations , 2016, ACL.

[16]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[17]  R. Speer,et al.  An Ensemble Method to Produce High-Quality Word Embeddings , 2016, ArXiv.

[18]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Heng Ji,et al.  Learning Phrase Embeddings from Paraphrases with GRUs , 2017, ArXiv.

[21]  Giacomo Berardi,et al.  Word Embeddings Go to Italy: A Comparison of Models and Training Datasets , 2015, IIR.

[22]  Mark Dredze,et al.  Learning Composition Models for Phrase Embeddings , 2015, TACL.

[23]  Reed McEwan,et al.  Corpus domain effects on distributional semantic modeling of medical terms , 2016, Bioinform..

[24]  Hinrich Schütze,et al.  Introduction to Information Retrieval: Scoring, term weighting, and the vector space model , 2008 .

[25]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[26]  Benoît Favre,et al.  Word Embedding Evaluation and Combination , 2016, LREC.

[27]  Thorsten Joachims,et al.  Evaluation methods for unsupervised word embeddings , 2015, EMNLP.

[28]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[29]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[30]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[31]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[32]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Dean P. Foster,et al.  Two Step CCA: A new spectral method for estimating vector models of words , 2012, ICML 2012.

[35]  Ondrej Sváb-Zamazal,et al.  Antonyms are similar: Towards paradigmatic association approach to rating similarity in SimLex-999 and WordSim-353 , 2018, Data Knowl. Eng..

[36]  Hinrich Schütze,et al.  Scoring , term weighting and thevector space model , 2015 .

[37]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[38]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[39]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[40]  Ronan Collobert,et al.  Word Embeddings through Hellinger PCA , 2013, EACL.

[41]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[42]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[43]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[44]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[45]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[46]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.