Overcoming Poor Word Embeddings with Word Definitions

Modern natural language understanding models depend on pretrained subword embeddings, but applications may need to reason about words that were never or rarely seen during pretraining. We show that examples that depend critically on a rarer word are more challenging for natural language inference models. Then we explore how a model could learn to use definitions, provided in natural text, to overcome this handicap. Our model’s understanding of a definition is usually weaker than a well-modeled word embedding, but it recovers most of the performance gap from using a completely untrained word.

[1]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[2]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Iryna Gurevych,et al.  Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers , 2020, DEELIO.

[5]  Mohit Bansal,et al.  Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.

[6]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[7]  Hinrich Schütze,et al.  BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance , 2020, ACL.

[8]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[9]  Sungroh Yoon,et al.  Interpretation of NLP Models through Input Marginalization , 2020, EMNLP.

[10]  Zhen-Hua Ling,et al.  Neural Natural Language Inference Models Enhanced with External Knowledge , 2017, ACL.

[11]  Christophe Gravier,et al.  Dict2vec : Learning Word Embeddings using Lexical Dictionaries , 2017, EMNLP.

[12]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Erik T. Mueller,et al.  Open Mind Common Sense: Knowledge Acquisition from the General Public , 2002, OTM.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Walter Daelemans,et al.  Pattern for Python , 2012, J. Mach. Learn. Res..

[16]  Jonathan Berant,et al.  Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge , 2020, ArXiv.

[17]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[18]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[19]  Danushka Bollegala,et al.  Dictionary-based Debiasing of Pre-trained Word Embeddings , 2021, EACL.