Semantic transparency effects in German compounds: A large dataset and multiple-task investigation

In the present study, we provide a comprehensive analysis and a multi-dimensional dataset of semantic transparency measures for 1810 German compound words. Compound words are considered semantically transparent when the contribution of the constituents’ meaning to the compound meaning is clear (as in airport ), but the degree of semantic transparency varies between compounds (compare strawberry or sandman ). Our dataset includes both compositional and relatedness-based semantic transparency measures, also differentiated by constituents. The measures are obtained from a computational and fully implemented semantic model based on distributional semantics. We validate the measures using data from four behavioral experiments: Explicit transparency ratings, two different lexical decision tasks using different nonwords, and an eye-tracking study. We demonstrate that different semantic effects emerge in different behavioral tasks, which can only be captured using a multi-dimensional approach to semantic transparency. We further provide the semantic transparency measures derived from the model for a dataset of 40,475 additional German compounds, as well as for 2061 novel German compounds.

[1]  Gary Libben,et al.  Semantic Transparency in the Processing of Compounds: Consequences for Representation, Processing, and Impairment , 1998, Brain and Language.

[2]  Paul Thagard,et al.  Conceptual Combination and Scientific Discovery , 1984, PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association.

[3]  Marco Marelli,et al.  The Language-invariant Aspect of Compounding: Predicting Compound Meanings Across Languages , 2018, CLiC-it.

[4]  Winston D. Goh,et al.  The role of semantic transparency in visual word recognition of compound words: A megastudy approach , 2018, Behavior Research Methods.

[5]  A. Jacobs,et al.  The word frequency effect: a review of recent developments and implications for the choice of frequency estimates in German. , 2011, Experimental psychology.

[6]  Pienie Zwitserlood,et al.  The impact of semantic transparency of morphologically complex words on picture naming , 2004, Brain and Language.

[7]  Georgiana Dinu,et al.  DISSECT - DIStributional SEmantics Composition Toolkit , 2013, ACL.

[8]  Christina L. Gagné,et al.  Benefits and costs of lexical decomposition and semantic integration during the processing of transparent and opaque English compounds , 2011 .

[9]  Marco Marelli,et al.  Vector-Space Models of Semantic Representation From a Cognitive Perspective: A Discussion of Common Misconceptions , 2019, Perspectives on psychological science : a journal of the Association for Psychological Science.

[10]  R. Baayen,et al.  Analyzing Reaction Times , 2010 .

[11]  M. Marelli,et al.  Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. , 2015, Psychological review.

[12]  Alexander Pollatsek,et al.  The role of semantic transparency in the processing of Finnish compound words , 2005 .

[13]  Thomas T. Hills,et al.  Hidden processes in structural representations: A reply to Abbott, Austerweil, and Griffiths (2015). , 2015, Psychological review.

[14]  Daniel Schmidtke,et al.  LADEC: The Large Database of English Compounds , 2019, Behavior Research Methods.

[15]  Frank Burchert,et al.  Compound naming in aphasia: effects of complexity, part of speech, and semantic transparency , 2014 .

[16]  Victor Kuperman,et al.  Individual Variability in the Semantic Processing of English Compound Words , 2017, Journal of experimental psychology. Learning, memory, and cognition.

[17]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[18]  Cécile Beauvillain The Integration of Morphological and Whole-Word Form Information during Eye Fixations on Prefixed and Suffixed Words ☆ , 1996 .

[19]  Geoff Hollis,et al.  Estimating the average need of semantic knowledge from distributional semantic models , 2017, Memory & Cognition.

[20]  Marco Marelli,et al.  Enter sandman: Compound processing and semantic transparency in a compositional perspective. , 2019, Journal of experimental psychology. Learning, memory, and cognition.

[21]  Charles E. Osgood,et al.  Salience of the word as a unit in the perception of language , 1974 .

[22]  Christina L. Gagné,et al.  Influence of Thematic Relations on the Comprehension of Modifier–noun Combinations , 1997 .

[23]  Marco Marelli,et al.  ‘Understanding’ differs between English and German: Capturing systematic language differences of complex words , 2019, Cortex.

[24]  Christina L. Gagné,et al.  Constituent integration during the processing of compound words: Does it involve the use of relational structures? , 2009 .

[25]  P. Zwitserlood,et al.  A chatterbox is a box: Morphology in German word production , 2006 .

[26]  G. Libben,et al.  ‘Can you wash off the hogwash?’ – semantic transparency of first and second constituents in the processing of German compounds , 2017 .

[27]  Fritz Günther,et al.  LSAfun - An R package for computations based on Latent Semantic Analysis , 2015, Behavior research methods.

[28]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[29]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[30]  Alexander Pollatsek,et al.  The role of semantic transparency in the processing of English compound words. , 2008, British journal of psychology.

[31]  Erhard W. Hinrichs,et al.  Determining Immediate Constituents of Compounds in GermaNet , 2011, RANLP.

[32]  Gary Libben,et al.  The nature of compounds: A psychocentric perspective , 2014, Cognitive neuropsychology.

[33]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[34]  V. Kuperman,et al.  A paradox of apparent brainless behavior: The time-course of compound word recognition , 2019, Cortex.

[35]  Gary Libben,et al.  Compound fracture: The role of semantic transparency and morphological headedness , 2003, Brain and Language.

[36]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[37]  R. M. Wenley Foundations of Knowledge , 1901 .

[38]  Dirk Koester,et al.  The morphosyntactic decomposition and semantic composition of German compound words investigated by ERPs , 2007, Brain and Language.

[39]  Barbara J. Juhasz,et al.  The influence of semantic transparency on eye movements during English compound word recognition , 2007 .

[40]  Marco Marelli,et al.  Understanding Karma Police: The Perceived Plausibility of Noun Compounds as Predicted by Distributional Models of Semantic Representation , 2016, PloS one.

[41]  Chris Westbury,et al.  Pay no attention to that man behind the curtain: Explaining semantics without semantics , 2016 .

[42]  Marco Marelli,et al.  Frequency Effects in the Processing of Italian Nominal Compounds: Modulation of Headedness and Semantic Transparency , 2012 .

[43]  Pienie Zwitserlood,et al.  Manipulations of word frequency reveal differences in the processing of morphologically complex and simple words in German , 2013, Front. Psychol..

[44]  Alexander Pollatsek,et al.  Identifying compound words in reading: An overview and a model , 2005 .

[45]  Barbara J. Juhasz,et al.  A database of 629 English compound words: ratings of familiarity, lexeme meaning dominance, semantic transparency, age of acquisition, imageability, and sensory experience , 2014, Behavior Research Methods.

[46]  R. Harald Baayen,et al.  Analyzing linguistic data: a practical introduction to statistics using R, 1st Edition , 2008 .

[47]  Rüdiger Weingarten,et al.  Written production of German compounds : Effects of lexical frequency and semantic transparency , 2008 .

[48]  Christina L. Gagné,et al.  Compounding as Abstract Operation in Semantic Space: Investigating relational effects through a large-scale, data-driven computational model , 2017, Cognition.

[49]  P. Zwitserlood,et al.  Processing of nominal compounds and gender-marked determiners in aphasia: Evidence from German , 2014, Cognitive neuropsychology.

[50]  Ingo Plag,et al.  Word-Formation in English , 2018 .

[51]  Rebecca Treiman,et al.  The English Lexicon Project , 2007, Behavior research methods.

[52]  Martin Schäfer The Semantic Transparency of English Compound Nouns , 2018 .

[53]  R. Baayen,et al.  Reading polymorphemic Dutch compounds: toward a multiple route model of lexical processing. , 2009, Journal of experimental psychology. Human perception and performance.

[54]  Per B. Brockhoff,et al.  lmerTest Package: Tests in Linear Mixed Effects Models , 2017 .

[55]  James L. McClelland,et al.  A distributed, developmental model of word recognition and naming. , 1989, Psychological review.

[56]  W. Kintsch The role of knowledge in discourse comprehension: a construction-integration model. , 1988, Psychological review.

[57]  M. Marelli,et al.  Picking buttercups and eating butter cups: Spelling alternations, semantic relatedness, and their consequences for compound processing , 2014, Applied Psycholinguistics.

[58]  Pamela A. Downing On the Creation and Use of English Compound Nouns. , 1977 .

[59]  K. Rayner The perceptual span and peripheral cues in reading , 1975, Cognitive Psychology.

[60]  Barbara J. Juhasz,et al.  Experience with compound words influences their processing: An eye movement investigation with English compound words , 2018, Quarterly journal of experimental psychology.

[61]  Pienie Zwitserlood,et al.  The role of semantic transparency in the processing and representation of Dutch compounds , 1994 .

[62]  E. Guevara A Regression Model of Adjective-Noun Compositionality in Distributional Semantics , 2010 .

[63]  Dominiek Sandra,et al.  On the Representation and Processing of Compound Words: Automatic Access to Constituent Morphemes Does Not Occur , 1990 .

[64]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[65]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[66]  Sabine Schulte im Walde,et al.  GhoSt-NN: A Representative Gold Standard of German Noun-Noun Compounds , 2016, LREC.

[67]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[68]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[69]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[70]  M. Brysbaert,et al.  Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting : A review and empirical validation , 2017 .