Improving Interpretability of Word Embeddings by Generating Definition and Usage

Word Embeddings, which encode semantic and syntactic features, have achieved success in many natural language processing tasks recently. However, the lexical semantics captured by these embeddings are difficult to interpret due to the dense vector representations. In order to improve the interpretability of word vectors, we explore definition modeling task and propose a novel framework (Semantics-Generator) to generate more reasonable and understandable context-dependent definitions. Moreover, we introduce usage modeling and study whether it is possible to utilize distributed representations to generate example sentences of words. These ways of semantics generation are a more direct and explicit expression of embedding's semantics. Two multi-task learning methods are used to combine usage modeling and definition modeling. To verify our approach, we construct Oxford-2019 dataset, where each entry contains word, context, example sentence and corresponding definition. Experimental results show that Semantics-Generator achieves the state-of-the-art result in definition modeling and the multi-task learning methods are helpful for two tasks to improve the performance.

[1]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[2]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[3]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[4]  Yang Liu,et al.  Incorporating Sememes into Chinese Definition Modeling , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Lucy Vanderwende,et al.  Automatically Deriving Structured Knowledge Bases From On-Line Dictionaries , 1993 .

[6]  Doug Downey,et al.  Definition Modeling: Learning to Define Word Embeddings in Natural Language , 2016, AAAI.

[7]  Dmitry P. Vetrov,et al.  Conditional Generators of Words Definitions , 2018, ACL.

[8]  José Camacho-Collados,et al.  From Word to Sense Embeddings: A Survey on Vector Representations of Meaning , 2018, J. Artif. Intell. Res..

[9]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[10]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11]  Martin Chodorow,et al.  Extracting Semantic Hierarchies from a Large On-Line Dictionary , 1985, ACL.

[12]  Yulia Tsvetkov,et al.  Sparse Overcomplete Word Vector Representations , 2015, ACL.

[13]  Graham Neubig,et al.  Learning to Describe Unknown Phrases with Local and Global Contexts , 2019, NAACL.

[14]  Tong Wang,et al.  Learning Lexical Embeddings with Syntactic and Lexicographic Knowledge , 2015, ACL.

[15]  Xueqi Cheng,et al.  Sparse Word Embeddings Using ℓ1 Regularized Online Learning , 2016, IJCAI.

[16]  Quoc V. Le,et al.  Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.

[17]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[18]  Doug Downey,et al.  Sparse Information Extraction: Unsupervised Language Models to the Rescue , 2007, ACL.

[19]  Evangelos Kanoulas,et al.  Improving Word Embedding Compositionality using Lexicographic Definitions , 2018, WWW.

[20]  Judith L. Klavans,et al.  Extracting taxonomic relationships from on-line definitional sources using LEXING , 2001, JCDL '01.

[21]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[22]  Jan Niehues,et al.  Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning , 2017, WMT.

[23]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[24]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[25]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[26]  Alice H. Oh,et al.  Rotated Word Vector Representations and their Interpretability , 2017, EMNLP.

[27]  Simonetta Montemagni,et al.  Structural Patterns vs. String Patterns for Extracting Semantic Information from Dictionaries , 1992, COLING.

[28]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[29]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[30]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[31]  Bartunov Sergey,et al.  Breaking Sticks and Ambiguities with Adaptive Skip-gram , 2016 .

[32]  David J. Weir,et al.  Improving Sparse Word Representations with Distributional Inference for Semantic Composition , 2016, EMNLP.

[33]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[34]  Zhiyuan Liu,et al.  Online Learning of Interpretable Word Embeddings , 2015, EMNLP.

[35]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[36]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[37]  Pascal Vincent,et al.  Auto-Encoding Dictionary Definitions into Consistent Word Embeddings , 2018, EMNLP.

[38]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[39]  Yoshua Bengio,et al.  Learning to Understand Phrases by Embedding the Dictionary , 2015, TACL.

[40]  Ta-Chung Chi,et al.  xSense: Learning Sense-Separated Sparse Representations and Textual Definitions for Explainable Word Sense Networks , 2018, ArXiv.

[41]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43]  Yi Yang,et al.  Learning Representations for Weakly Supervised Natural Language Processing Tasks , 2014, CL.

[44]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[45]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.