GGP: Glossary Guided Post-processing for Word Embedding Learning

Word embedding learning is the task to map each word into a low-dimensional and continuous vector based on a large corpus. To enhance corpus based word embedding models, researchers utilize domain knowledge to learn more distinguishable representations via joint optimization and post-processing based models. However, joint optimization based models require much training time. Existing post-processing models mostly consider semantic knowledge while learned embedding models show less functional information. Glossary is a comprehensive linguistic resource. And in previous works, the glossary is usually used to enhance the word representations via joint optimization based methods. In this paper, we post-process pre-trained word embedding models with incorporating the glossary and capture more topical and functional information. We propose GGP (Glossary Guided Post-processing word embedding) model which consists of a global post-processing function to fine-tune each word vector, and an auto-encoding model to learn sense representations, furthermore, constrains each post-processed word representation and the composition of its sense representations to be similar. We evaluate our model by comparing it with two state-of-the-art models on six word topical/functional similarity datasets, and the results show that it outperforms competitors by an average of 4.1% across all datasets. And our model outperforms GloVe by more than 7%.

[1]  Christophe Gravier,et al.  Dict2vec : Learning Word Embeddings using Lexical Dictionaries , 2017, EMNLP.

[2]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[3]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[4]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[7]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[8]  Goran Glavas,et al.  Explicit Retrofitting of Distributional Word Vectors , 2018, ACL.

[9]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[10]  Pascal Vincent,et al.  Auto-Encoding Dictionary Definitions into Consistent Word Embeddings , 2018, EMNLP.

[11]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[12]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[13]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[14]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[18]  Daniel Jurafsky,et al.  A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.

[19]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[20]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[21]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[22]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.