Don’t Invite BERT to Drink a Bottle: Modeling the Interpretation of Metonymies Using BERT and Distributional Representations

In this work, we carry out two experiments in order to assess the ability of BERT to capture the meaning shift associated with metonymic expressions. We test the model on a new dataset that is representative of the most common types of metonymy. We compare BERT with the Structured Distributional Model (SDM), a model for the representation of words in context which is based on the notion of Generalized Event Knowledge. The results reveal that, while BERT ability to deal with metonymy is quite limited, SDM is good at predicting the meaning of metonymic expressions, providing support for an account of metonymy based on event knowledge.

[1]  Sebastian Padó,et al.  Modeling covert event retrieval in logical metonymy: probabilistic and distributional accounts , 2012, CMCL@NAACL-HLT.

[2]  Alessandro Lenci,et al.  Distributional Models of Word Meaning , 2018 .

[3]  Alessandro Lenci,et al.  Composing and Updating Verb Argument Expectations: A Distributional Semantic Model , 2011, CMCL@ACL.

[4]  Kees van Deemter,et al.  What do you mean, BERT? Assessing BERT as a Distributional Semantics Model , 2019, ArXiv.

[5]  Chu-Ren Huang,et al.  A structured distributional model of sentence meaning and processing , 2019, Natural Language Engineering.

[6]  R. Constable,et al.  Metonymy as Referential Dependency: Psycholinguistic and Neurolinguistic Arguments for a Unified Linguistic Treatment. , 2017, Cognitive science.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Ken McRae,et al.  People Use their Knowledge of Common Events to Understand Language, and Do So as Quickly as Possible , 2009, Lang. Linguistics Compass.

[9]  Ido Dagan,et al.  Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition , 2019, TACL.

[10]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Günter Radden,et al.  Towards a Theory of Metonymy , 1999 .

[13]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[14]  Jeffrey L. Elman,et al.  Systematicity in the Lexicon : On Having Your Cake and Eating It Too , 2014 .

[15]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[16]  Gemma Boleda,et al.  Distributional Semantics and Linguistic Theory , 2019, Annual Review of Linguistics.

[17]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[18]  Stefan Thater,et al.  Word Meaning in Context: A Simple and Effective Vector Model , 2011, IJCNLP.

[19]  Ray Jackendoff,et al.  The Architecture of the Language Faculty , 1996 .

[20]  J. Littlemore Metonymy: Hidden Shortcuts in Language, Thought and Communication , 2015 .