A Psycholinguistic Analysis of BERT’s Representations of Compounds

This work studies the semantic representations learned by BERT for compounds, that is, expressions such as sunlight or bodyguard. We build on recent studies that explore semantic information in Transformers at the word level and test whether BERT aligns with human semantic intuitions when dealing with expressions (e.g., sunlight) whose overall meaning depends—to a various extent—on the semantics of the constituent words (sun, light). We leverage a dataset that includes human judgments on two psycholinguistic measures of compound semantic analysis: lexeme meaning dominance (LMD; quantifying the weight of each constituent toward the compound meaning) and semantic transparency (ST; evaluating the extent to which the compound meaning is recoverable from the constituents’ semantics). We show that BERT-based measures moderately align with human intuitions, especially when using contextualized representations, and that LMD is overall more predictable than ST. Contrary to the results reported for ‘standard’ words, higher, more contextualized layers are the best at representing compound meaning. These findings shed new light on the abilities of BERT in dealing with fine-grained semantic phenomena. Moreover, they can provide insights into how speakers represent compounds.

[1]  Alessandro Lenci,et al.  A comparative evaluation and analysis of three generations of Distributional Semantic Models , 2021, Language Resources and Evaluation.

[2]  Sandro Pezzelle,et al.  Word Representation Learning in Multimodal Pre-Trained Transformers: An Intrinsic Evaluation , 2021, Transactions of the Association for Computational Linguistics.

[3]  Goran Glavas,et al.  Probing Pretrained Language Models for Lexical Semantics , 2020, EMNLP.

[4]  Jacob Eisenstein,et al.  Will it Unblend? , 2020, FINDINGS.

[5]  Claire Cardie,et al.  Interpreting Pretrained Contextualized Representations via Reductions to Static Embeddings , 2020, ACL.

[6]  M. Marelli,et al.  Semantic transparency is not invisibility: A computational model of perceptually-grounded conceptual combination in word processing , 2020, Journal of Memory and Language.

[7]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[8]  Sandro Pezzelle,et al.  Do semantic features capture a syntactic classification of compounds? Insights from compositional distributional semantics , 2020 .

[9]  Kees van Deemter,et al.  What do you mean, BERT? Assessing BERT as a Distributional Semantics Model , 2019, ArXiv.

[10]  Kawin Ethayarajh,et al.  How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.

[11]  Martin Wattenberg,et al.  Visualizing and Measuring the Geometry of BERT , 2019, NeurIPS.

[12]  Gemma Boleda,et al.  Don’t Blame Distributional Semantics if it can’t do Entailment , 2019, IWCS.

[13]  Ido Dagan,et al.  Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition , 2019, TACL.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Marco Marelli,et al.  Enter sandman: Compound processing and semantic transparency in a compositional perspective. , 2019, Journal of experimental psychology. Learning, memory, and cognition.

[16]  Christina L. Gagné,et al.  Compounding as Abstract Operation in Semantic Space: Investigating relational effects through a large-scale, data-driven computational model , 2017, Cognition.

[17]  Marco Marelli,et al.  Understanding Karma Police: The Perceived Plausibility of Noun Compounds as Predicted by Distributional Models of Semantic Representation , 2016, PloS one.

[18]  Barbara J. Juhasz,et al.  A database of 629 English compound words: ratings of familiarity, lexeme meaning dominance, semantic transparency, age of acquisition, imageability, and sensory experience , 2014, Behavior Research Methods.

[19]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[20]  K. Haerling,et al.  Making Sense of Methods and Measurement: Spearman-Rho Ranked-Order Correlation Coefficient , 2014 .

[21]  Amy Beth Warriner,et al.  Concreteness ratings for 40 thousand generally known English word lemmas , 2014, Behavior research methods.

[22]  M. Marelli,et al.  Picking buttercups and eating butter cups: Spelling alternations, semantic relatedness, and their consequences for compound processing , 2014, Applied Psycholinguistics.

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Marco Marelli,et al.  Frequency Effects in the Processing of Italian Nominal Compounds: Modulation of Headedness and Semantic Transparency , 2012 .

[25]  Christina L. Gagné,et al.  Benefits and costs of lexical decomposition and semantic integration during the processing of transparent and opaque English compounds , 2011 .

[26]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[27]  E. Guevara A Regression Model of Adjective-Noun Compositionality in Distributional Semantics , 2010 .

[28]  Christina L. Gagné,et al.  Constituent integration during the processing of compound words: Does it involve the use of relational structures? , 2009 .

[29]  M. Marelli,et al.  Head position and the mental representation of nominal compounds: A constituent priming study in Italian , 2009 .

[30]  Christina L. Gagné,et al.  Conceptual Combination: Implications for the mental lexicon , 2007 .

[31]  Gary Libben,et al.  Semantic Transparency in the Processing of Compounds: Consequences for Representation, Processing, and Impairment , 1998, Brain and Language.

[32]  M. Jackson What do you mean? , 1989, Geriatric nursing.

[33]  Pamela A. Downing On the Creation and Use of English Compound Nouns. , 1977 .