Avoiding the language-as-a-fixed-effect fallacy: How to estimate outcomes!of linear mixed models

Avoiding the language-as-a-fixed-effect fallacy: How to estimate outcomes of linear mixed models Sterling Hutchinson (S.C.Hutchinson@tilburguniversity.nl) Tilburg Centre for Cognition and Communication (TiCC), Tilburg University PO Box 90153, 5000 LE, Tilburg, The Netherlands Lei Wei (Lei.Wei@roswellpark.org) Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute Buffalo, NY 14263 USA Max M. Louwerse (mlouwerse@tilburguniversity.nl) Tilburg Centre for Cognition and Communication (TiCC), Tilburg University PO Box 90153, 5000 LE, Tilburg, The Netherlands Abstract Since the 1970s, researchers in psycholinguistics and the cognitive sciences have been aware of the language-as-fixed- effect fallacy, or the importance in statistical analyses to not only average across participants (F 1 ) but also across items (F 2 ). Originally, the language-as-fixed-effect fallacy was countered by proposing a combined measure (minF’) calculated by participant (F 1 ) and item (F 2 ) analyses. The scientific community, however, reported separate participant and item (F 1 and F 2 ) regression analyses instead. More recently, researchers have started using linear mixed models, a more robust statistical methodology that considers both random participant and item factors together in the same analysis. There are various benefits to using mixed models, including being more robust to missing values and unequal cell sizes than other linear models, such as ANOVAs. Yet it is unclear how conservative or liberal mixed methods are in comparison to the traditional methods. Moreover, reanalyzing previously completed work with linear mixed models seems cumbersome. It is therefore desirable to understand the benefits of linear mixed models and to know under what conditions results that are significant for one model might beget significant results for other models, in order to estimate the outcome of a mixed effect model based on traditional F 1 , F 2 , and minF’ analyses. The current paper demonstrates that it is possible, at least for the most simplistic model, for an F or p value from a linear mixed model to be estimated from the same values from more traditional analyses. Keywords: statistics; parametric statistics; linear mixed models; Analysis of Variance, language-as-a-fixed-effect fallacy. Introduction Researchers in cognitive science, and in psycholinguistics specifically, have often incorrectly analyzed their experimental data simply by failing to use the proper statistical methods (Raaijmakers, Schrijnemakers, & Gremmen, 1999). This paper aims to answer the question whether the results of a proper statistical analysis can be estimated on the basis of the traditional, but improper, statistical analysis. Many experimental studies in psycholinguistics consist of a generic simple reading time (RT) experiment whereby participants are asked to make semantic judgments about a word (or sentence, or paragraph). The time it takes for each participant to respond to an item (RT) is typically used as the dependent variable. Most of the time, participants are drawn from a convenience sample of university undergraduate students. However, to generalize findings to a larger population, participants are treated as a random factor in a regression analysis. Consequently, if the experiment were to be repeated with a different group of participants, the same effects are assumed to hold. In other words, any variation in RT specific to an individual participant (e.g., if one participant overall tends to respond faster than another) should be disregarded as random error. This allows for the generalization to a greater population than those participants included in the experiment. For the most part, researchers correctly identify when it is necessary to do this, and they accurately treat participants as random factors, keeping the Type I (and Type II) error rate low. However, this method is not always used for the item stimuli in an experiment. Coleman (1964) and Clark (1973) recognized that although researchers in psycholinguistics correctly specified participants as random factors, variance in items (words, sentences, and paragraphs) was all but ignored. Like generalizing over participants, Clark (1973) argued that in most cases, researchers would like to be able to run their experiment with a different set of stimuli and find the same effects. He therefore argued that not only participants should be treated as random factors, but items as well. Just as participants in an experiment do not represent an entire population, items in an experiment are by no means representative of all the possibilities of language (Baayen, Davison, & Bates, 2008; Barr, Levy, Scheepers, & Tily, 2013). The failure to also indicate items as being a random factor, and thereby also failing to generalize past the specific items included in a particular experiment, is known as the language-as-a-fixed-effect fallacy (Clark, 1973). Thankfully, in addition to pointing out this fallacy, Clark (1973) also proposed a simple solution to this problem. He

[1]  R. H. Baayen,et al.  Analyzing Linguistic Data: Solutions to the exercises , 2008 .

[2]  J. Raaijmakers A further look at the "language-as-fixed-effect fallacy". , 2003, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[3]  R. Allan Reese,et al.  Linear Mixed Models: a Practical Guide using Statistical Software , 2008 .

[4]  H. Bergh,et al.  Examples of Mixed-Effects Modeling with Crossed Random Effects and with Binomial Data. , 2008 .

[5]  Bradford S. Jones,et al.  MULTILEVEL MODELS , 2007 .

[6]  H. H. Clark The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. , 1973 .

[7]  B. J. Winer Statistical Principles in Experimental Design , 1992 .

[8]  James A. Bovaird,et al.  On the use of multilevel modeling as an alternative to items analysis in psycholinguistic research , 2007, Behavior research methods.

[9]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[10]  Thomas D. Wickens,et al.  On the choice of design and of test statistic in the analysis of experiments with sampled materials , 1983 .

[11]  D. Barr,et al.  Random effects structure for confirmatory hypothesis testing: Keep it maximal. , 2013, Journal of memory and language.

[12]  Bodo Winter,et al.  Linear models and linear mixed effects models in R with linguistic applications , 2013, ArXiv.

[13]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[14]  J. Raaijmakers,et al.  How to deal with "The language-as-fixed-effect fallacy": Common misconceptions and alternative solutions. , 1999 .