A Survey On Neural Word Embeddings

Understanding human language has been a sub-challenge on the way of intelligent machines. The study of meaning in natural language processing (NLP) relies on the distributional hypothesis where language elements get meaning from the words that co-occur within contexts. The revolutionary idea of distributed representation for a concept is close to the working of a human mind in that the meaning of a word is spread across several neurons, and a loss of activation will only slightly affect the memory retrieval process. Neural word embeddings transformed the whole field of NLP by introducing substantial improvements in all NLP tasks. In this survey, we provide a comprehensive literature review on neural word embeddings. We give theoretical foundations and describe existing work by an interplay between word embeddings and language modeling. We provide broad coverage on neural word embeddings, including early word embeddings, embeddings targeting specific semantic relations, sense embeddings, morpheme embeddings, and finally, contextual representations. Finally, we describe benchmark datasets in word embeddings’ performance evaluation and downstream tasks along with the performance results of/due to word embeddings.

[1]  Ryan Cotterell,et al.  Morphological Word-Embeddings , 2019, NAACL.

[2]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[3]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[4]  Tao Chen,et al.  Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering , 2015, ACL.

[5]  Ngoc Thang Vu,et al.  Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction , 2016, ACL.

[6]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[7]  Aoying Zhou,et al.  SphereRE: Distinguishing Lexical Relations with Hyperspherical Relation Embeddings , 2019, ACL.

[8]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[9]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[10]  M. Tomasello,et al.  Twelve-month-olds communicate helpfully and appropriately for knowledgeable and ignorant partners , 2008, Cognition.

[11]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[12]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[13]  Haixun Wang,et al.  Learning Term Embeddings for Hypernymy Identification , 2015, IJCAI.

[14]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[15]  Yu Sun,et al.  ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.

[16]  Wanxiang Che,et al.  Learning Sense-specific Word Embeddings By Exploiting Bilingual Resources , 2014, COLING.

[17]  Markus Forsberg,et al.  SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.

[18]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[19]  Kevin Gimpel,et al.  From Paraphrase Database to Compositional Paraphrase Model and Back , 2015, Transactions of the Association for Computational Linguistics.

[20]  Daniel Jurafsky,et al.  Do Multi-Sense Embeddings Improve Natural Language Understanding? , 2015, EMNLP.

[21]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[22]  Yang Xu,et al.  Incorporating Latent Meanings of Morphological Compositions to Enhance Word Embeddings , 2018, ACL.

[23]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[24]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[25]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[26]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[27]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[28]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[29]  Wolfgang Lezius,et al.  TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[30]  Wenpeng Yin,et al.  Learning Word Meta-Embeddings , 2016, ACL.

[31]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[32]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[34]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[35]  Radu Soricut,et al.  Unsupervised Morphology Induction Using Word Embeddings , 2015, NAACL.

[36]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[37]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[38]  Partha Pratim Talukdar,et al.  Zero-shot Word Sense Disambiguation using Sense Definition Embeddings , 2019, ACL.

[39]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[40]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[41]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[42]  Ronan Collobert,et al.  Word Embeddings through Hellinger PCA , 2013, EACL.

[43]  Samuel R. Bowman,et al.  Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[44]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[45]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[46]  Ido Dagan,et al.  The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.

[47]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[48]  Erik Velldal,et al.  Diachronic word embeddings and semantic shifts: a survey , 2018, COLING.

[49]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[50]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[51]  Dimitri Kartsaklis,et al.  Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models of Meaning , 2015, EMNLP.

[52]  Tie-Yan Liu,et al.  KNET: A General Framework for Learning Word Embedding Using Morphological Knowledge , 2014, TOIS.

[53]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[54]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[55]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[56]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[57]  Ignacio Iacobacci,et al.  LSTMEmbed: Learning Word and Sense Representations from a Large Semantically Annotated Corpus with Long Short-Term Memories , 2019, ACL.

[58]  Katrin Kirchhoff,et al.  Factored Neural Language Models , 2006, NAACL.

[59]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[60]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[61]  Chris Callison-Burch,et al.  PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification , 2015, ACL.

[62]  Anders Søgaard,et al.  A Survey of Cross-lingual Word Embedding Models , 2017, J. Artif. Intell. Res..

[63]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[64]  Zhiyuan Liu,et al.  A Unified Model for Word Sense Representation and Disambiguation , 2014, EMNLP.

[65]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[66]  Erhan Sezerer,et al.  Incorporating Concreteness in Multi-Modal Language Models with Curriculum Learning , 2021, Applied Sciences.

[67]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[68]  Tie-Yan Liu,et al.  Knowledge-Powered Deep Learning for Word Embedding , 2014, ECML/PKDD.

[69]  Ryan Cotterell,et al.  Morphological Smoothing and Extrapolation of Word Embeddings , 2016, ACL.

[70]  Wei Xu,et al.  Can artificial neural networks learn language models? , 2000, INTERSPEECH.

[71]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[72]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[73]  Kris Cao,et al.  A Joint Model for Word Embedding and Word Morphology , 2016, Rep4NLP@ACL.

[74]  Zhiyuan Liu,et al.  Topical Word Embeddings , 2015, AAAI.

[75]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[76]  Amir Bakarov,et al.  A Survey of Word Embeddings Evaluation Methods , 2018, ArXiv.

[77]  Gökhan Tür,et al.  Statistical Morphological Disambiguation for Agglutinative Languages , 2000, COLING.

[78]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[79]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[80]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[81]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[82]  Yu Hu,et al.  Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints , 2015, ACL.

[83]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[84]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[85]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[86]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[87]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[88]  Gabriella Vigliocco,et al.  Integrating experiential and distributional data to learn semantic representations. , 2009, Psychological review.

[89]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[90]  Sandro Pezzelle,et al.  The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.

[91]  Chris Dyer,et al.  Ontologically Grounded Multi-sense Representation Learning for Semantic Vector Space Models , 2015, NAACL.

[92]  R. Notley Short Papers , 1971, 2009 5th IEEE International Workshop on Visualizing Software for Understanding and Analysis.

[93]  Frank Rudzicz,et al.  A survey of word embeddings for clinical text , 2019, J. Biomed. Informatics X.

[94]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[95]  Xiaoyong Du,et al.  Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics , 2017, EMNLP.

[96]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[97]  Siu Cheung Hui,et al.  Learning Term Embeddings for Taxonomic Relation Identification Using Dynamic Weighting Neural Network , 2016, EMNLP.

[98]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[99]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[100]  M. Tomasello,et al.  12- and 18-Month-Olds Point to Provide Information for Others , 2006 .

[101]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[102]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[103]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[104]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[105]  Ramón Fernández Astudillo,et al.  Not All Contexts Are Created Equal: Better Word Representations with Variable Attention , 2015, EMNLP.

[106]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[107]  Enhong Chen,et al.  A Probabilistic Model for Learning Multi-Prototype Word Embeddings , 2014, COLING.

[108]  Anna Korhonen,et al.  Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules , 2017, ACL.

[109]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[110]  Christian Biemann,et al.  Making Sense of Word Embeddings , 2016, Rep4NLP@ACL.

[111]  Richard Johansson,et al.  A Simple and Efficient Method to Generate Word Sense Representations , 2015, RANLP.

[112]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[113]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[114]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[115]  Christophe Gravier,et al.  Dict2vec : Learning Word Embeddings using Lexical Dictionaries , 2017, EMNLP.

[116]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[117]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[118]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[119]  Ngoc Thang Vu,et al.  Hierarchical Embeddings for Hypernymy Detection and Directionality , 2017, EMNLP.

[120]  Hongfang Liu,et al.  A Comparison of Word Embeddings for the Biomedical Natural Language Processing , 2018, J. Biomed. Informatics.

[121]  Tie-Yan Liu,et al.  Co-learning of Word Representations and Morpheme Representations , 2014, COLING.

[122]  Stavroula Kousta,et al.  Toward a theory of semantic representation , 2009, Language and Cognition.

[123]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[124]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[125]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[126]  Stefan Thater,et al.  A Mixture Model for Learning Multi-Sense Word Embeddings , 2017, *SEMEVAL.

[127]  Franklin Mark Liang Word hy-phen-a-tion by com-put-er , 1983 .

[128]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[129]  José Camacho-Collados,et al.  From Word to Sense Embeddings: A Survey on Vector Representations of Meaning , 2018, J. Artif. Intell. Res..

[130]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[131]  Joakim Nivre,et al.  A Dynamic Oracle for Arc-Eager Dependency Parsing , 2012, COLING.

[132]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[133]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[134]  Yejin Choi,et al.  PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.

[135]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[136]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.