A survey of word embeddings based on deep learning

The representational basis for downstream natural language processing tasks is word embeddings, which capture lexical semantics in numerical form to handle the abstract semantic concept of words. Recently, the word embeddings approaches, represented by deep learning, has attracted extensive attention and widely used in many tasks, such as text classification, knowledge mining, question-answering, smart Internet of Things systems and so on. These neural networks-based models are based on the distributed hypothesis while the semantic association between words can be efficiently calculated in low-dimensional space. However, the expressed semantics of most models are constrained by the context distribution of each word in the corpus while the logic and common knowledge are not better utilized. Therefore, how to use the massive multi-source data to better represent natural language and world knowledge still need to be explored. In this paper, we introduce the recent advances of neural networks-based word embeddings with their technical features, summarizing the key challenges and existing solutions, and further give a future outlook on the research and application.

[1]  Xiaolong Wang,et al.  Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation , 2015, IJCAI.

[2]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[3]  Ido Dagan,et al.  Contextual Word Similarity and Estimation from Sparse Data , 1993, ACL.

[4]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[5]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[6]  Hung-yi Lee,et al.  Learning Chinese Word Representations From Glyphs Of Characters , 2017, EMNLP.

[7]  Yang Xiang,et al.  Chinese Named Entity Recognition with Character-Word Mixed Embedding , 2017, CIKM.

[8]  Hao Xin,et al.  Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components , 2017, EMNLP.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Wenjie Li,et al.  Component-Enhanced Chinese Character Embeddings , 2015, EMNLP.

[11]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[12]  Zhiyuan Liu,et al.  Joint Learning of Character and Word Embeddings , 2015, IJCAI.

[13]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[14]  Jianqiang Li,et al.  WCP-RNN: a novel RNN-based approach for Bio-NER in Chinese EMRs , 2018, The Journal of Supercomputing.

[15]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[16]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[17]  Dean P. Foster,et al.  Multi-View Learning of Word Embeddings via CCA , 2011, NIPS.

[18]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[19]  Christophe Gravier,et al.  Dict2vec : Learning Word Embeddings using Lexical Dictionaries , 2017, EMNLP.

[20]  Ronan Collobert,et al.  Word Embeddings through Hellinger PCA , 2013, EACL.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[23]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[24]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[25]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[26]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Rong Jin,et al.  Understanding bag-of-words model: a statistical framework , 2010, Int. J. Mach. Learn. Cybern..

[29]  Maosong Sun,et al.  Improved Learning of Chinese Word Embeddings with Semantic Knowledge , 2015, CCL.

[30]  Mirella Lapata,et al.  A Comparison of Vector-based Representations for Semantic Composition , 2012, EMNLP.

[31]  Ryan Cotterell,et al.  Morphological Word-Embeddings , 2019, NAACL.

[32]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[33]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[34]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[35]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[36]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[37]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[38]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[39]  Juan Martínez-Romo,et al.  Can deep learning techniques improve classification performance of vandalism detection in Wikipedia? , 2019, Eng. Appl. Artif. Intell..

[40]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[41]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[42]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[43]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[44]  Huanhuan Chen,et al.  Improve Chinese Word Embeddings by Exploiting Internal Structure , 2016, NAACL.

[45]  Kris Cao,et al.  A Joint Model for Word Embedding and Word Morphology , 2016, Rep4NLP@ACL.

[46]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[47]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[48]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[49]  Wei Xu,et al.  Can artificial neural networks learn language models? , 2000, INTERSPEECH.

[50]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[51]  Yongbin Liu,et al.  Empirical study on character level neural network classifier for Chinese text , 2019, Eng. Appl. Artif. Intell..

[52]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[53]  Honglak Lee,et al.  An efficient framework for learning sentence representations , 2018, ICLR.

[54]  Benjamin Heinzerling,et al.  BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages , 2017, LREC.

[55]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[56]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[57]  Baogang Wei,et al.  Mining coherent topics in documents using word embeddings and large-scale text data , 2017, Eng. Appl. Artif. Intell..

[58]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[59]  Dean P. Foster,et al.  Eigenwords: spectral word embeddings , 2015, J. Mach. Learn. Res..

[60]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[61]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[62]  Jun Zhou,et al.  cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information , 2018, AAAI.

[63]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[64]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[65]  Tie-Yan Liu,et al.  Knowledge-Powered Deep Learning for Word Embedding , 2014, ECML/PKDD.

[66]  Jörg Tiedemann,et al.  Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture , 2018, ArXiv.

[67]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[68]  Yang Xu,et al.  Implicitly Incorporating Morphological Information into Word Embedding , 2017, ArXiv.

[69]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[70]  Dekang Lin,et al.  Phrase Clustering for Discriminative Learning , 2009, ACL.

[71]  Mark Sanderson,et al.  Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press 2008. ISBN-13 978-0-521-86571-5, xxi + 482 pages , 2010, Natural Language Engineering.

[72]  Nan Yang,et al.  Radical-Enhanced Chinese Character Embedding , 2014, ICONIP.

[73]  Phil Blunsom,et al.  Recurrent Convolutional Neural Networks for Discourse Compositionality , 2013, CVSM@ACL.

[74]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[75]  Hinrich Schfitze Context Space , 2001 .