Comparing Probabilistic, Distributional and Transformer-Based Models on Logical Metonymy Interpretation

In linguistics and cognitive science, Logical metonymies are defined as type clashes between an event-selecting verb and an entitydenoting noun (e.g. The editor finished the article), which are typically interpreted by inferring a hidden event (e.g. reading) on the basis of contextual cues. This paper tackles the problem of logical metonymy interpretation, that is, the retrieval of the covert event via computational methods. We compare different types of models, including the probabilistic and the distributional ones previously introduced in the literature on the topic. For the first time, we also tested on this task some of the recent Transformer-based models, such as BERT, RoBERTa, XLNet, and GPT-2. Our results show a complex scenario, in which the best Transformer-based models and some traditional distributional models perform very similarly. However, the low performance on some of the testing datasets suggests that logical metonymy is still a challenging phenomenon for computational modeling.

[1]  Ekaterina Shutova,et al.  Sense-based Interpretation of Logical Metonymy Using a Statistical Method , 2009, ACL.

[2]  Stefan Evert,et al.  Corpora and collocations , 2007 .

[3]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[4]  Sebastian Padó,et al.  Generalized Event Knowledge in Logical Metonymy Resolution , 2011, CogSci.

[5]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[6]  Jeffrey L. Elman,et al.  Systematicity in the Lexicon : On Having Your Cake and Eating It Too , 2014 .

[7]  Emmanuele Chersoni,et al.  Logical Metonymy in a Distributional Model of Sentence Comprehension , 2017, *SEMEVAL.

[8]  Chu-Ren Huang,et al.  A structured distributional model of sentence meaning and processing , 2019, Natural Language Engineering.

[9]  Mirella Lapata,et al.  A Probabilistic Account of Logical Metonymy , 2003, Computational Linguistics.

[10]  Giosuè Baggio,et al.  The balance between memory and unification in semantics: A dynamic account of the N400 , 2011 .

[11]  Ken McRae,et al.  People Use their Knowledge of Common Events to Understand Language, and Do So as Quickly as Possible , 2009, Lang. Linguistics Compass.

[12]  Eneko Agirre,et al.  Selectional Preferences for Semantic Role Classification , 2013, CL.

[13]  S. S. Stevens On the psychophysical law. , 1957, Psychological review.

[14]  Alessandro Lenci,et al.  Logical Metonymy Resolution in a Words-as-Cues Framework: Evidence From Self-Paced Reading and Probe Recognition , 2014, Cogn. Sci..

[15]  Vera Demberg,et al.  Thematic fit evaluation: an aspect of selectional preferences , 2016, RepEval@ACL.

[16]  Martin J. Pickering,et al.  Coercion in sentence processing: evidence from eye-movements and self-paced reading , 2002 .

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  M. Colombo The Architecture of Cognition: Rethinking Fodor and Pylyshyn’s Systematicity Challenge , 2016 .

[19]  Hans Kamp,et al.  Meaning and the Dynamics of Interpretation: Selected Papers of Hans Kamp , 2013 .

[20]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[21]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[24]  Nicholas Asher,et al.  Types, meanings and coercions in lexical semantics , 2015 .

[25]  Alessandro Lenci,et al.  Composing and Updating Verb Argument Expectations: A Distributional Semantic Model , 2011, CMCL@ACL.

[26]  채현식,et al.  What is the Lexicon , 2013 .

[27]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[28]  M. Crocker,et al.  Teasing apart coercion and surprisal: Evidence from eye-movements and ERPs , 2017, Cognition.

[29]  Sebastian Padó,et al.  Modeling covert event retrieval in logical metonymy: probabilistic and distributional accounts , 2012, CMCL@NAACL-HLT.

[30]  J. Elman On the Meaning of Words and Dinosaur Bones: Lexical Knowledge Without a Lexicon , 2009, Cogn. Sci..

[31]  Alessandro Lenci,et al.  Distributional Models of Word Meaning , 2018 .

[32]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[33]  Giosuè Baggio,et al.  The Processing Consequences of Compositionality , 2012 .

[34]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[35]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[36]  Emmanuele Chersoni,et al.  Measuring Thematic Fit with Distributional Feature Overlap , 2017, EMNLP.

[37]  Brian McElree,et al.  Reading time evidence for enriched composition , 2001, Cognition.

[38]  Alessandro Lenci,et al.  Fitting, Not Clashing! A Distributional Semantic Model of Logical Metonymy , 2013, IWCS.

[39]  Brian McElree,et al.  Complement Coercion Is Not Modulated by Competition: Evidence from Eye Movements , 2022 .