Cloze but no cigar: The complex relationship between cloze, corpus, and subjective probabilities in language processing

Cloze but no cigar: The complex relationship between cloze, corpus, and subjective probabilities in language processing Nathaniel J. Smith njsmith@cogsci.ucsd.edu UC San Diego Department of Cognitive Science 9500 Gilman Drive #515, La Jolla, CA 92093-0515 USA Roger Levy rlevy@ucsd.edu UC San Diego Department of Linguistics 9500 Gilman Drive #108, La Jolla, CA 92093-0108 USA a Abstract When performing online language comprehension, compre- henders probabilistically anticipate upcoming words. Psy- cholinguistic studies thus often depend on accurately estimat- ing stimulus predictability, either to control it or to study it, and this estimation is conventionally accomplished via the cloze task. But we do not know how effectively — or even, strictly speaking, whether — cloze probabilities reflect comprehender predictions. This is both methodologically worrisome and an obstacle to detailed understanding of online predictive mecha- nisms. Here, we demonstrate first that cloze probabilities vary substantially and systematically from normative corpus statis- tics, and secondly that some portion of these deviations are also reflected in online comprehension measures. Therefore, while there is some reason to be concerned that cloze norming may be distorting the results of psycholinguistic studies, these ap- parent distortions may instead reflect genuine errors in native speakers’ probabilistic models of their language. Keywords: Psychology; Linguistics; Prediction; Language Understanding; Reading; Rationality There’s currently a great deal of interest in how the brain makes and uses predictions (Bar, 2009). Within psycholin- guistics, this interest dates back 30 years, to the discovery that the predictability of a word — its probability of occur- rence given preceding context — has large and robust effects on both reading times (Ehrlich & Rayner, 1981) and event- related brain potentials (Kutas & Hillyard, 1984). These early studies, and innumerable others since, rely on the cloze task (Taylor, 1953) to measure the predictability of their stimuli. Many more studies use cloze to control for predictability in order to isolate some other variable of interest. Yet despite its ubiquitous use as an estimate of predictability, we know almost nothing about what this task is actually measuring. The cloze task consists of presenting a large group of par- ticipants with sentence stems like In the winter and and asking each to fill in the blank with some plausible con- tinuation — some might write spring, others summer, and so on. We then count up what proportion of participants re- sponded with each word; this proportion is called the cloze probability of that word in that context. Our goal is to get some estimate of the subjective probability distribution over continuations which skilled comprehenders compute implic- itly during online comprehension; Fig. 1 summarizes the logi- cal relationship between these subjective probability distribu- tions, cloze probability distributions, and alternative corpus b c True language statistics in the world Individual knowledge of language Predictions used by individual in comprehension d e Corpus text Cloze responses f Computational language model estimates Figure 1: An informal illustration of the situation faced by those who wish to study linguistic prediction. Language is actually used in some particular ways in the real world (a); some subset of these uses are recorded in corpora (e), and may be used to train computational language models (f). A different subset is experienced by human language users, who use these experiences to create some internal model of the statistics of their language (b). They then draw on this inter- nal model to make predictions during online linguistic com- prehension (c) and also, presumably, when responding in the cloze task (d). But the actual relationship between the items on the left side of the diagram remains obscure — do cloze completions match online predictions? Do online predictions match real-world statistics? based measurements. We know that the participants in a cloze task have some knowledge of their language (Fig. 1b), which they presum- ably draw on when producing continuations. But isn’t clear how they use this knowledge. If they generated their cloze responses by sampling from their subjective probability dis- tribution (‘probability matching’), then cloze probabilities would be identical to subjective probabilities. 1 But cloze norming is an offline, untimed, and rather unnatural task, which leaves ample room for conscious reflection and other strategic effects to distort this process — if participants are 1 At least if we ignore inter-subject variation, as is conventional.

[1]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[2]  Thomas A. Schreiber,et al.  The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[3]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[4]  J. Tenenbaum,et al.  Optimal Predictions in Everyday Cognition , 2006, Psychological science.

[5]  Max Coltheart,et al.  The MRC Psycholinguistic Database , 1981 .

[6]  R. Shillcock,et al.  Low-level predictive inference in reading: the influence of transitional probabilities on eye movements , 2003, Vision Research.

[7]  Erez Lieberman Aiden,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010, Science.

[8]  Nathaniel J. Smith,et al.  Optimal Processing Times in Reading: A Formal Model and Empirical Investigation , 2008 .

[9]  Frank Keller,et al.  Data from eye-tracking corpora as evidence for theories of syntactic processing complexity , 2008, Cognition.

[10]  Marc Brys,et al.  Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English , 2009 .

[11]  J. Woolley,et al.  Paradigms and processes in reading comprehension. , 1982, Journal of experimental psychology. General.

[12]  Wilson L. Taylor,et al.  “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[13]  H. Stadthagen-González,et al.  The Bristol norms for age of acquisition, imageability, and familiarity , 2006, Behavior research methods.

[14]  R. Levy Expectation-based syntactic comprehension , 2008, Cognition.

[15]  Michael Wilson,et al.  MRC psycholinguistic database: Machine-usable dictionary, version 2.00 , 1988 .

[16]  M. Kutas,et al.  Brain potentials during reading reflect word expectancy and semantic association , 1984, Nature.

[17]  K. Rayner,et al.  Contextual effects on word perception and eye movements during reading , 1981 .