Quasi Bidirectional Encoder Representations from Transformers for Word Sense Disambiguation

While contextualized embeddings have produced performance breakthroughs in many Natural Language Processing (NLP) tasks, Word Sense Disambiguation (WSD) has not benefited from them yet. In this paper, we introduce QBERT, a Transformer-based architecture for contextualized embeddings which makes use of a co-attentive layer to produce more deeply bidirectional representations, better-fitting for the WSD task. As a result, we are able to train a WSD system that beats the state of the art on the concatenation of all evaluation datasets by over 3 points, also outperforming a comparable model using ELMo.

[1]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[2]  Roberto Navigli,et al.  SemEval-2013 Task 12: Multilingual Word Sense Disambiguation , 2013, *SEMEVAL.

[3]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[4]  Hwee Tou Ng,et al.  It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text , 2010, ACL.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[7]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Daniel Baumartz,et al.  FastSense: An Efficient Word Sense Disambiguation Classifier , 2018, LREC.

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Roberto Navigli,et al.  Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training Data , 2017, EMNLP.

[12]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[13]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[14]  Roberto Navigli,et al.  SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking , 2015, *SEMEVAL.

[15]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Stefano Melacci,et al.  Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic Lexical Resources , 2018, LREC.

[18]  Alessandro Raganato,et al.  SupWSD: A Flexible Toolkit for Supervised Word Sense Disambiguation , 2017, EMNLP.

[19]  Mikael Kågebäck,et al.  Word Sense Disambiguation using a Bidirectional LSTM , 2016, CogALex@COLING.

[20]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[21]  Alexei Baevski,et al.  Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.

[22]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[23]  Hans Uszkoreit,et al.  Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities , 2015, ACL.

[24]  Ignacio Iacobacci,et al.  Embeddings for Word Sense Disambiguation: An Evaluation Study , 2016, ACL.

[25]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[26]  Moustapha Cissé,et al.  Efficient softmax approximation for GPUs , 2016, ICML.

[27]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[28]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[29]  Piek T. J. M. Vossen,et al.  A Deep Dive into Word Sense Disambiguation with LSTM , 2018, COLING.

[30]  Hwee Tou Ng,et al.  One Million Sense-Tagged Instances for Word Sense Disambiguation and Induction , 2015, CoNLL.

[31]  Zhifang Sui,et al.  Incorporating Glosses into Neural Word Sense Disambiguation , 2018, ACL.

[32]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[33]  Ryan Doherty,et al.  Semi-supervised Word Sense Disambiguation with Neural Models , 2016, COLING.

[34]  Benjamin Lecouteux,et al.  Improving the Coverage and the Generalization Ability of Neural Word Sense Disambiguation through Hypernymy and Hyponymy Relationships , 2018, ArXiv.

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Omer Levy,et al.  Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling , 2018, ACL.

[37]  Jonathan Weese,et al.  UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems , 2013, *SEMEVAL.

[38]  Scott Cotton,et al.  SENSEVAL-2: Overview , 2001, *SEMEVAL.

[39]  Roberto Navigli,et al.  Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison , 2017, EACL.

[40]  José Camacho-Collados,et al.  WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations , 2018, NAACL.