Improved Learning of Word Embeddings with Word Definitions and Semantic Injection

Recently, two categories of linguistic knowledge sources, word definitions from monolingual dictionaries and linguistic relations (e.g. synonymy and antonymy), have been leveraged separately to improve the traditional co-occurrence based methods for learning word embeddings. In this paper, we investigate to leverage these two kinds of resources together. Specifically, we propose a new method for word embedding specialization, named Definition Autoencoder with Semantic Injection (DASI). In our experiments, DASI outperforms its single-knowledgesource counterparts on two semantic similarity benchmarks, and the improvements are further justified on a downstream task of dialog state tracking. We also show that DASI is superior over simple combinations of existing methods in incorporating the two knowledge sources.

[1]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[2]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[3]  Anna Korhonen,et al.  Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints , 2017, TACL.

[4]  Goran Glavas,et al.  Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization , 2018, EMNLP.

[5]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[6]  Tsung-Hsien Wen,et al.  Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.

[7]  Tong Wang,et al.  Learning Lexical Embeddings with Syntactic and Lexicographic Knowledge , 2015, ACL.

[8]  Christophe Gravier,et al.  Dict2vec : Learning Word Embeddings using Lexical Dictionaries , 2017, EMNLP.

[9]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[10]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Ian Lane,et al.  BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer , 2019, INTERSPEECH.

[13]  Bernardo Magnini,et al.  Domain-Aware Dialogue State Tracker for Multi-Domain Dialogue Systems , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[15]  Jianmo Ni,et al.  Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation , 2019, EMNLP.

[16]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[17]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[18]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[19]  Pascal Vincent,et al.  Auto-Encoding Dictionary Definitions into Consistent Word Embeddings , 2018, EMNLP.

[20]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[21]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Kevin Gimpel,et al.  From Paraphrase Database to Compositional Paraphrase Model and Back , 2015, Transactions of the Association for Computational Linguistics.

[24]  Richard Socher,et al.  Global-Locally Self-Attentive Encoder for Dialogue State Tracking , 2018, ACL.

[25]  Doug Downey,et al.  Definition Modeling: Learning to Define Word Embeddings in Natural Language , 2016, AAAI.

[26]  Goran Glavas,et al.  Explicit Retrofitting of Distributional Word Vectors , 2018, ACL.

[27]  Antoine Raux,et al.  The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.