Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT

Prior studies on event knowledge in sentence comprehension have shown that the aspect of the main verb plays an important role in the processing of non-core semantic roles, such as locations: when the aspect of the main verb is imperfective, locations become more salient in the mental representation of the event and are easier for human comprehenders to process. In our study, we tested the popular language model BERT on two datasets derived from experimental studies to determine whether BERT’s predictions of prototypical event locations were also influenced by aspect. We found that, although BERT efficiently modelled the typicality of locations, it did so independently of the verb aspect. Even when the transformer was forced to focus on the verb phrase by masking the context words in the sentence, the typicality predictions were still accurate; in addition, we found aspect to have a stronger influence on the scores, with locations in the imperfective setting being associated with lower surprisal values.

[1]  Ken McRae,et al.  People Use their Knowledge of Common Events to Understand Language, and Do So as Quickly as Possible , 2009, Lang. Linguistics Compass.

[2]  Silvia P. Gennari,et al.  Time in language: Event duration in language comprehension , 2011, Cognitive Psychology.

[3]  Rolf A. Zwaan,et al.  How does verb aspect constrain event representations? , 2003, Memory & cognition.

[4]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[5]  Marta Kutas,et al.  Verb aspect and the activation of event knowledge. , 2007, Journal of experimental psychology. Learning, memory, and cognition.

[6]  Yuval Marton,et al.  Thematic fit bits: Annotation quality and quantity for event participant representation , 2021, ArXiv.

[7]  R. Levy Expectation-based syntactic comprehension , 2008, Cognition.

[8]  Roger Levy,et al.  Neural language models as psycholinguistic subjects: Representations of syntactic state , 2019, NAACL.

[9]  K. McRae,et al.  Integrating Verbs, Situation Schemas, and Thematic Role Concepts , 2001 .

[10]  Roger Levy,et al.  What do RNN Language Models Learn about Filler–Gap Dependencies? , 2018, BlackboxNLP@EMNLP.

[11]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[12]  Mary Hare,et al.  Activating event knowledge , 2009, Cognition.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  J. Elman,et al.  A basis for generating expectancies for verbs from nouns , 2005, Memory & cognition.

[15]  Allyson Ettinger,et al.  Exploring BERT's Sensitivity to Lexical Cues using Tests from Semantic Priming , 2020, EMNLP.

[16]  J. Elman,et al.  Effects of event knowledge in processing verbal arguments. , 2010, Journal of memory and language.

[17]  M. Tanenhaus,et al.  Modeling the Influence of Thematic Fit (and Other Constraints) in On-line Sentence Comprehension , 1998 .

[18]  Tim Van de Cruys,et al.  How Relevant Are Selectional Preferences for Transformer-based Language Models? , 2020, COLING.

[19]  Tal Linzen,et al.  Using Priming to Uncover the Organization of Syntactic Representations in Neural Language Models , 2019, CoNLL.

[20]  Allyson Ettinger,et al.  Do language models learn typicality judgments from text? , 2021, ArXiv.

[21]  Emmanuele Chersoni,et al.  Measuring Thematic Fit with Distributional Feature Overlap , 2017, EMNLP.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Nathaniel J. Smith,et al.  The effect of word predictability on reading time is logarithmic , 2013, Cognition.

[24]  Gabriella Vigliocco,et al.  Word surprisal predicts N400 amplitude during reading , 2013, ACL.

[25]  Vera Demberg,et al.  Thematic fit evaluation: an aspect of selectional preferences , 2016, RepEval@ACL.

[26]  Chu-Ren Huang,et al.  Comparing Probabilistic, Distributional and Transformer-Based Models on Logical Metonymy Interpretation , 2020, AACL.

[27]  M. Kutas,et al.  Brain potentials during reading reflect word expectancy and semantic association , 1984, Nature.

[28]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[29]  Allyson Ettinger,et al.  What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, TACL.

[30]  Peter Ford Dominey,et al.  Grammatical verb aspect and event roles in sentence processing , 2017, PloS one.

[31]  Chu-Ren Huang,et al.  Are Word Embeddings Really a Bad Fit for the Estimation of Thematic Fit? , 2020, LREC.

[32]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[33]  Allyson Ettinger,et al.  Spying on Your Neighbors: Fine-grained Probing of Contextual Embeddings for Information about Surrounding Words , 2020, ACL.

[34]  Christoph Scheepers,et al.  Event-based plausibility immediately influences on-line language comprehension. , 2011, Journal of experimental psychology. Learning, memory, and cognition.