Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation

The success of deep learning methods hinges on the availability of large training datasets annotated for the task of interest. In contrast to human intelligence, these methods lack versatility and struggle to learn and adapt quickly to new tasks, where labeled data is scarce. Meta-learning aims to solve this problem by training a model on a large number of few-shot tasks, with an objective to learn new tasks quickly from a small number of examples. In this paper, we propose a meta-learning framework for few-shot word sense disambiguation (WSD), where the goal is to learn to disambiguate unseen words from only a few labeled instances. Meta-learning approaches have so far been typically tested in an N-way, K-shot classification setting where each task has N classes with K examples per class. Owing to its nature, WSD deviates from this controlled setup and requires the models to handle a large number of highly unbalanced classes. We extend several popular meta-learning approaches to this scenario, and analyze their strengths and weaknesses in this new challenging setting.

[1]  Clayton T. Morrison,et al.  Meta-Learning Initializations for Image Segmentation , 2019, ArXiv.

[2]  Roberto Navigli,et al.  Neural Sequence Learning Models for Word Sense Disambiguation , 2017, EMNLP.

[3]  Roberto Navigli,et al.  Breaking Through the 80% Glass Ceiling: Raising the State of the Art in Word Sense Disambiguation by Incorporating Knowledge Graph Information , 2020, ACL.

[4]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[5]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[6]  Mikael Kågebäck,et al.  Word Sense Disambiguation using a Bidirectional LSTM , 2016, CogALex@COLING.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Information Retrieval , 2012, ACL.

[9]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[10]  J. Schulman,et al.  Reptile: a Scalable Metalearning Algorithm , 2018 .

[11]  Partha Pratim Talukdar,et al.  Zero-shot Word Sense Disambiguation using Sense Definition Embeddings , 2019, ACL.

[12]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[16]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[17]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[18]  Ignacio Iacobacci,et al.  Embeddings for Word Sense Disambiguation: An Evaluation Study , 2016, ACL.

[19]  George A. Miller,et al.  Using a Semantic Concordance for Sense Identification , 1994, HLT.

[20]  Wenhan Xiong,et al.  Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification , 2019, EMNLP.

[21]  Lei Yu,et al.  Learning and Evaluating General Linguistic Intelligence , 2019, ArXiv.

[22]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[23]  Hwee Tou Ng,et al.  An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation , 2002, EMNLP.

[24]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[25]  Artem Molchanov,et al.  Generalized Inner Loop Meta-Learning , 2019, ArXiv.

[26]  Andreas Vlachos,et al.  Model-Agnostic Meta-Learning for Relation Classification with Limited Supervision , 2019, ACL.

[27]  Yizhou Sun,et al.  Few-Shot Representation Learning for Out-Of-Vocabulary Words , 2019, ACL.

[28]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[29]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[30]  Xuanjing Huang,et al.  GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge , 2019, EMNLP.

[31]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[32]  Shengli Sun,et al.  Hierarchical Attention Prototypical Networks for Few-Shot Text Classification , 2019, EMNLP.

[33]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[35]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[36]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[37]  Hugo Larochelle,et al.  Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[38]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[39]  Hwee Tou Ng,et al.  Semi-Supervised Word Sense Disambiguation Using Word Embeddings in General and Specific Domains , 2015, NAACL.

[40]  Andreas Vlachos,et al.  Meta-Learning Improves Lifelong Relation Extraction , 2019, RepL4NLP@ACL.

[41]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[42]  Roberto Navigli,et al.  Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison , 2017, EACL.

[43]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[44]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[45]  Hwee Tou Ng,et al.  It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text , 2010, ACL.

[46]  Yuandong Tian,et al.  Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP , 2019, ICLR.

[47]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[48]  Yu Cheng,et al.  Diverse Few-Shot Text Classification with Multiple Metrics , 2018, NAACL.

[49]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[50]  Jian Sun,et al.  Induction Networks for Few-Shot Text Classification , 2019, EMNLP/IJCNLP.

[51]  A. Prati,et al.  MetalGAN: Multi-Domain Label-Less Image Synthesis Using cGANs and Meta-Learning , 2019, Neural Networks.

[52]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[53]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[54]  Leslie Pack Kaelbling,et al.  Meta-learning curiosity algorithms , 2020, ICLR.

[55]  Qiang Chen,et al.  Meta Relational Learning for Few-Shot Link Prediction in Knowledge Graphs , 2019, EMNLP-IJCNLP 2019.

[56]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[57]  Stan Matwin,et al.  Attentive Task-Agnostic Meta-Learning for Few-Shot Text Classification , 2018 .

[58]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[59]  Eneko Agirre,et al.  Random Walks for Knowledge-Based Word Sense Disambiguation , 2014, CL.

[60]  Ryan Doherty,et al.  Semi-supervised Word Sense Disambiguation with Neural Models , 2016, COLING.

[61]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[62]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[63]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[64]  Zi-Yi Dou,et al.  Investigating Meta-Learning Algorithms for Low-Resource Natural Language Understanding Tasks , 2019, EMNLP.

[65]  A. McCallum,et al.  Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks , 2019, COLING.

[66]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[67]  Hwee Tou Ng,et al.  Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations , 2019, EMNLP.

[68]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[69]  Zhiwei Xiong,et al.  Tracking by Instance Detection: A Meta-Learning Approach , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).