Decomposing Word Embedding with the Capsule Network

Word sense disambiguation tries to learn the appropriate sense of an ambiguous word in a given context. The existing pre-trained language methods and the methods based on multi-embeddings of word did not explore the power of the unsupervised word embedding sufficiently. In this paper, we discuss a capsule network-based approach, taking advantage of capsule's potential for recognizing highly overlapping features and dealing with segmentation. We propose a Capsule network-based method to Decompose the unsupervised word Embedding of an ambiguous word into context specific Sense embedding, called CapsDecE2S. In this approach, the unsupervised ambiguous embedding is fed into capsule network to produce its multiple morpheme-like vectors, which are defined as the basic semantic language units of meaning. With attention operations, CapsDecE2S integrates the word context to reconstruct the multiple morpheme-like vectors into the context-specific sense embedding. To train CapsDecE2S, we propose a sense matching training method. In this method, we convert the sense learning into a binary classification that explicitly learns the relation between senses by the label of matching and non-matching. The CapsDecE2S was experimentally evaluated on two sense learning tasks, i.e., word in context and word sense disambiguation. Results on two public corpora Word-in-Context and English all-words Word Sense Disambiguation show that, the CapsDecE2S model achieves the new state-of-the-art for the word in context and word sense disambiguation tasks.

[1]  Yun-Nung Chen,et al.  What Does This Word Mean? Explaining Contextualized Embeddings with Natural Language Definition , 2019, EMNLP.

[2]  Roberto Navigli,et al.  Two Knowledge-based Methods for High-Performance Sense Distribution Learning , 2018, AAAI.

[3]  Zhifang Sui,et al.  Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention , 2018, EMNLP.

[4]  Benjamin Lecouteux,et al.  Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation , 2019, GWC.

[5]  Christian Biemann,et al.  Making Sense of Word Embeddings , 2016, Rep4NLP@ACL.

[6]  José Camacho-Collados,et al.  From Word to Sense Embeddings: A Survey on Vector Representations of Meaning , 2018, J. Artif. Intell. Res..

[7]  Roberto Navigli,et al.  SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation , 2020, AAAI.

[8]  Roberto Navigli,et al.  Neural Sequence Learning Models for Word Sense Disambiguation , 2017, EMNLP.

[9]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[10]  Nigel Collier,et al.  De-Conflated Semantic Representations , 2016, EMNLP.

[11]  Chris Dyer,et al.  Ontologically Grounded Multi-sense Representation Learning for Semantic Vector Space Models , 2015, NAACL.

[12]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Ignacio Iacobacci,et al.  Embeddings for Word Sense Disambiguation: An Evaluation Study , 2016, ACL.

[15]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Roberto Navigli,et al.  SemEval-2013 Task 12: Multilingual Word Sense Disambiguation , 2013, *SEMEVAL.

[18]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[19]  Nigel Collier,et al.  Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs , 2018, EMNLP.

[20]  Hwee Tou Ng,et al.  It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text , 2010, ACL.

[21]  Daniel Loureiro,et al.  Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation , 2019, ACL.

[22]  Xiaoqing Zheng,et al.  Geometric Relationship between Word and Context Representations , 2018, AAAI.

[23]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[24]  Wanxiang Che,et al.  Learning Sense-specific Word Embeddings By Exploiting Bilingual Resources , 2014, COLING.

[25]  Hamido Fujita,et al.  Word Sense Disambiguation: A comprehensive knowledge exploitation framework , 2020, Knowl. Based Syst..

[26]  Roberto Navigli,et al.  Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison , 2017, EACL.

[27]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[28]  Marine Carpuat,et al.  Retrofitting Sense-Specific Word Vectors Using Parallel Text , 2016, HLT-NAACL.

[29]  Ignacio Iacobacci,et al.  Embedding Words and Senses Together via Joint Knowledge-Enhanced Training , 2016, CoNLL.

[30]  Ivan Titov,et al.  Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders , 2016, HLT-NAACL.

[31]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[32]  Xin Liu,et al.  LCQMC:A Large-scale Chinese Question Matching Corpus , 2018, COLING.

[33]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[34]  Partha Pratim Talukdar,et al.  Zero-shot Word Sense Disambiguation using Sense Definition Embeddings , 2019, ACL.

[35]  Stefano Faralli,et al.  Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation , 2017, EACL.

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Hwee Tou Ng,et al.  Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations , 2019, EMNLP.

[38]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[39]  Bianca Scarlini,et al.  CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages , 2020, ACL.

[40]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[41]  Roberto Navigli,et al.  SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking , 2015, *SEMEVAL.

[42]  José Camacho-Collados,et al.  WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations , 2018, NAACL.

[43]  Eneko Agirre,et al.  Random Walks for Knowledge-Based Word Sense Disambiguation , 2014, CL.

[44]  Xuanjing Huang,et al.  GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge , 2019, EMNLP.

[45]  Dimitri Kartsaklis,et al.  Separating Disambiguation from Composition in Distributional Semantics , 2013, CoNLL.

[46]  Zhifang Sui,et al.  Incorporating Glosses into Neural Word Sense Disambiguation , 2018, ACL.

[47]  Ryan Doherty,et al.  Semi-supervised Word Sense Disambiguation with Neural Models , 2016, COLING.

[48]  Roberto Navigli,et al.  Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction , 2013, CL.