Using BERT for Word Sense Disambiguation

Word Sense Disambiguation (WSD), which aims to identify the correct sense of a given polyseme, is a long-standing problem in NLP. In this paper, we propose to use BERT to extract better polyseme representations for WSD and explore several ways of combining BERT and the classifier. We also utilize sense definitions to train a unified classifier for all words, which enables the model to disambiguate unseen polysemes. Experiments show that our model achieves the state-of-the-art results on the standard English All-word WSD evaluation.

[1]  Mikael Kågebäck,et al.  Word Sense Disambiguation using a Bidirectional LSTM , 2016, CogALex@COLING.

[2]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[3]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[4]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[5]  Zhifang Sui,et al.  Leveraging Gloss Knowledge in Neural Word Sense Disambiguation by Hierarchical Co-Attention , 2018, EMNLP.

[6]  Scott Cotton,et al.  SENSEVAL-2: Overview , 2001, *SEMEVAL.

[7]  Roberto Navigli,et al.  Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison , 2017, EACL.

[8]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[9]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Information Retrieval , 2012, ACL.

[10]  George A. Miller,et al.  Using a Semantic Concordance for Sense Identification , 1994, HLT.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Hwee Tou Ng,et al.  One Million Sense-Tagged Instances for Word Sense Disambiguation and Induction , 2015, CoNLL.

[13]  Annalina Caputo,et al.  An Enhanced Lesk Word Sense Disambiguation Algorithm through a Distributional Semantic Model , 2014, COLING.

[14]  Roberto Navigli,et al.  Neural Sequence Learning Models for Word Sense Disambiguation , 2017, EMNLP.

[15]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[16]  Hwee Tou Ng,et al.  It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text , 2010, ACL.

[17]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Ignacio Iacobacci,et al.  Embeddings for Word Sense Disambiguation: An Evaluation Study , 2016, ACL.

[20]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[21]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[22]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[23]  Zhifang Sui,et al.  Incorporating Glosses into Neural Word Sense Disambiguation , 2018, ACL.

[24]  Ryan Doherty,et al.  Semi-supervised Word Sense Disambiguation with Neural Models , 2016, COLING.

[25]  Eneko Agirre,et al.  Word Sense-Aware Machine Translation: Including Senses as Contextual Features for Improved Translation Models , 2016, LREC.

[26]  Roberto Navigli,et al.  SemEval-2013 Task 12: Multilingual Word Sense Disambiguation , 2013, *SEMEVAL.

[27]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[28]  Roberto Navigli,et al.  SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking , 2015, *SEMEVAL.