Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention

The goal of Word Sense Disambiguation (WSD) is to identify the correct meaning of a word in the particular context. Traditional supervised methods only use labeled data (context), while missing rich lexical knowledge such as the gloss which defines the meaning of a word sense. Recent studies have shown that incorporating glosses into neural networks for WSD has made significant improvement. However, the previous models usually build the context representation and gloss representation separately. In this paper, we find that the learning for the context and gloss representation can benefit from each other. Gloss can help to highlight the important words in the context, thus building a better context representation. Context can also help to locate the key words in the gloss of the correct word sense. Therefore, we introduce a co-attention mechanism to generate co-dependent representations for the context and gloss. Furthermore, in order to capture both word-level and sentence-level information, we extend the attention mechanism in a hierarchical fashion. Experimental results show that our model achieves the state-of-the-art results on several standard English all-words WSD test datasets.

[1]  Jinlan Fu,et al.  Neural Networks Incorporating Dictionaries for Chinese Word Segmentation , 2018, AAAI.

[2]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[3]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[4]  Roberto Navigli,et al.  SemEval-2013 Task 12: Multilingual Word Sense Disambiguation , 2013, *SEMEVAL.

[5]  Mikael Kågebäck,et al.  Word Sense Disambiguation using a Bidirectional LSTM , 2016, CogALex@COLING.

[6]  Scott Cotton,et al.  SENSEVAL-2: Overview , 2001, *SEMEVAL.

[7]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[8]  Roberto Navigli,et al.  Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison , 2017, EACL.

[9]  Tao Chen,et al.  Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering , 2015, ACL.

[10]  Hua Wu,et al.  An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge , 2017, ACL.

[11]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[12]  Tom M. Mitchell,et al.  Leveraging Knowledge Bases in LSTMs for Improving Machine Reading , 2017, ACL.

[13]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[14]  Zhifang Sui,et al.  Incorporating Glosses into Neural Word Sense Disambiguation , 2018, ACL.

[15]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[16]  Eneko Agirre,et al.  Random Walks for Knowledge-Based Word Sense Disambiguation , 2014, CL.

[17]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[18]  Annalina Caputo,et al.  An Enhanced Lesk Word Sense Disambiguation Algorithm through a Distributional Semantic Model , 2014, COLING.

[19]  Roberto Navigli,et al.  Neural Sequence Learning Models for Word Sense Disambiguation , 2017, EMNLP.

[20]  Zhen Xu,et al.  Incorporating Loose-Structured Knowledge into LSTM with Recall Gate for Conversation Modeling , 2016, ArXiv.

[21]  Yoshua Bengio,et al.  A Neural Knowledge Language Model , 2016, ArXiv.

[22]  Hwee Tou Ng,et al.  It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text , 2010, ACL.

[23]  Ignacio Iacobacci,et al.  Embeddings for Word Sense Disambiguation: An Evaluation Study , 2016, ACL.

[24]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[25]  Roberto Navigli,et al.  SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking , 2015, *SEMEVAL.

[26]  Jiasen Lu,et al.  Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.