Assessing the time course of the influence of featural, distributional and spatial representations during reading

Assessing the time course of the influence of featural, distributional and spatial representations during reading Ernesto Guerra 1,2 (ernesto.guerra@mpi.nl) Falk Huettig 2 (falk.huettig@mpi.nl) Pia Knoeferle 1 (knoeferl@cit-ec.uni-bielefeld.de) Cognitive Interaction Technology Excellence Cluster and Department of Linguistics, Bielefeld University, Inspiration I, 33615, Bielefeld, Germany Max Planck Institute for Psycholinguistics, Wundtlaan 1, Nijmegen, 6525 XD, The Netherlands Abstract What does semantic similarity between two concepts mean? How could we measure it? The way in which semantic similarity is calculated might differ depending on the theoretical notion of semantic representation. In an eye- tracking reading experiment, we investigated whether two widely used semantic similarity measures (based on featural or distributional representations) have distinctive effects on sentence reading times. In other words, we explored whether these measures of semantic similarity differ qualitatively. In addition, we examined whether visually perceived spatial distance interacts with either or both of these measures. Our results showed that the effect of featural and distributional representations on reading times can differ both in direction and in its time course. Moreover, both featural and distributional information interacted with spatial distance, yet in different sentence regions and reading measures. We conclude that featural and distributional representations are distinct components of semantic representation. Keywords: semantic similarity, featural representations, distributional representations, spatial distance, eye tracking, reading. Introduction In the context of semantic representation of concepts, two perspectives have dominated research in the cognitive sciences. On one view, semantic representation is based on the perceived physical characteristics of objects (e.g., shape, color, etc.), but also the functional knowledge gained through direct interaction with them (e.g., is-edible, used-to- cut, etc., see Cree & McRae, 2003; McClelland & Rogers, 2003; McRae & Boisvert, 1998; McRae, de Sa, & Seidenberg, 1997; McRae et al., 2005; Rogers & McClelland, 2004, 2008; Vigliocco et al, 2004). For example, the word sheep refers to something that bleats, is covered with soft wool, is white or brown, has four legs, and eats grass. This sort of information is generally acquired through the senses. To put it in Andrews and colleagues‟ words (see Andrews, Vigliocco & Vinson, 2005, 2007, 2009), this kind of representational information can be described as extra-linguistic, featural and experiential. We will refer to this sort of data as featural representations for the rest of the paper. On a different view, semantic representation can be captured by examining the statistical dependencies between words across corpora of spoken and written language. Such corpora could include novels, essays, or articles from newspapers and scientific journals, but also transcribed spoken conversations. Latent semantic indexing (LSI, see Deerwester, Dumais, Landauer, Furnas, Harshman, 1990; Landauer & Dumais, 1997), for instance, is a method that reduces the dimensionality of a language corpus by decomposing each text in a frequency matrix, or text- document. In this model, the statistics are derived by a decomposition of the term frequencies in each of texts. Thus, this data can be described as intra-linguistic, disembodied and distributional, as we will refer to it for the rest of the paper. Indeed, both distributional and featural representations alone can produce models of semantic representation capable of accounting for human behavioral data (McRae et al., 1997; Landauer & Dumais, 1997; Lund & Burgess, 1996; Vigliocco et al., 2004). For instance, McRae et al. (1997) used feature-based similarity cosines to predict a number of human behavioral responses such as reaction times and similarity ratings. Similarly, Landauer and Dumais (1997) used distributional similarity cosines to predict performance both of non-native speakers in an English synonym test and of native speakers in a word- sorting task. Such studies, however, have concentrated on one of these sources of information, often neglecting the other. More recently, evidence from machine learning has showed that models integrating both featural and distributional information can outperform featural- or distributional-only models (Andrews et al., 2005, 2007, 2009). For instance, Andrews et al. (2007) trained three Bayesian models using either a combination of both featural and distributional representations, or featural or distributional representations alone. The three models were then compared on their predictive power in modeling human data from three semantic tasks; word association norms from, reaction times from a lexical priming experiment, and picture-word interference latencies. Overall, the combined model was the best predictor of human performance in the three tasks.

[1]  K. McRae,et al.  Automatic semantic similarity priming. , 1998 .

[2]  Mark S. Seidenberg,et al.  On the nature and scope of featural representations of word meaning. , 1997, Journal of experimental psychology. General.

[3]  M. Garrett,et al.  Representing the meanings of object and action words: The featural and unitary semantic space hypothesis , 2004, Cognitive Psychology.

[4]  Frank Keller,et al.  Syntactic priming in comprehension: Parallelism effects with and without coordination , 2010 .

[5]  David P Vinson,et al.  Semantic feature production norms for a large set of objects and events , 2008, Behavior research methods.

[6]  Mark S. Seidenberg,et al.  Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.

[7]  Ken McRae,et al.  Category - Specific semantic deficits , 2008 .

[8]  Gabriella Vigliocco,et al.  Integrating experiential and distributional data to learn semantic representations. , 2009, Psychological review.

[9]  James L. McClelland,et al.  The parallel distributed processing approach to semantic cognition , 2003, Nature Reviews Neuroscience.

[10]  Mark Andrews,et al.  The Role of Attributional and Distributional Information in Semantic Representation , 2005 .

[11]  G. Vigliocco,et al.  The representation of abstract words: why emotion matters. , 2011, Journal of experimental psychology. General.

[12]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[13]  D. Barr,et al.  Random effects structure for confirmatory hypothesis testing: Keep it maximal. , 2013, Journal of memory and language.

[14]  James L. McClelland,et al.  Semantic Cognition: A Parallel Distributed Processing Approach , 2004 .

[15]  James L. McClelland,et al.  Précis of Semantic Cognition: A Parallel Distributed Processing Approach , 2008, Behavioral and Brain Sciences.

[16]  Pia Knoeferle,et al.  Abstract language comprehension is incrementally modulated by non-referential spatial information: evidence from eye-tracking , 2012, CogSci.

[17]  D. Casasanto,et al.  Similarity and proximity: When does close in space mean close in mind? , 2008, Memory & cognition.

[18]  David P. Vinson,et al.  Evaluating the Contribution of Intra-Linguistic and Extra-Linguistic Data to the Structure of Human Semantic Representations , 2007 .

[19]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[20]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[21]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .