On Learning Word Embeddings From Linguistically Augmented Text Corpora

Word embedding learning is a technique in Natural Language Processing (NLP) to map words into vector space representations, is one of the most popular research directions in modern NLP by virtue of its potential to boost the performance of many NLP downstream tasks. Nevertheless, most of the underlying word embedding methods such as word2vec and GloVe fail to produce high-quality representations if the text corpus is small and sparse. This paper proposes a method to generate effective word embeddings from limited data. Empirically, we show that the proposed model outperforms existing works for the classical word similarity task and for a domain-specific application.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[5]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[6]  Katrin Erk,et al.  Vector Space Models of Word Meaning and Phrase Meaning: A Survey , 2012, Lang. Linguistics Compass.

[7]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[8]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[9]  Jerome L. Myers,et al.  Research Design and Statistical Analysis , 1991 .

[10]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[11]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[12]  Stephen Clark,et al.  Vector Space Models of Lexical Meaning , 2015 .

[13]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[14]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[15]  Jianxin Li,et al.  Training and Evaluating Improved Dependency-Based Word Embeddings , 2018, AAAI.

[16]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[17]  William W. Cohen,et al.  Learning Graph Walk Based Similarity Measures for Parsed Text , 2008, EMNLP.

[18]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[19]  Suresh Manandhar,et al.  Dependency Based Embeddings for Sentence Classification Tasks , 2016, NAACL.

[20]  Oren Etzioni,et al.  A Latent Dirichlet Allocation Method for Selectional Preferences , 2010, ACL.

[21]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[22]  Wei Lu,et al.  SemEval-2018 Task 8: Semantic Extraction from CybersecUrity REports using Natural Language Processing (SecureNLP) , 2018, *SEMEVAL.

[23]  Thorsten Brants,et al.  Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation , 2008, ACL.

[24]  Ken-ichi Kawarabayashi,et al.  Unsupervised Cross-Domain Word Representation Learning , 2015, ACL.

[25]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[26]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[27]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[28]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[29]  Zellig S. Harris,et al.  Distributional Structure , 1954 .