Evaluating the robustness of repeated measures analyses: The case of small sample sizes and nonnormal data

Repeated measures analyses of variance are the method of choice in many studies from experimental psychology and the neurosciences. Data from these fields are often characterized by small sample sizes, high numbers of factor levels of the within-subjects factor(s), and nonnormally distributed response variables such as response times. For a design with a single within-subjects factor, we investigated Type I error control in univariate tests with corrected degrees of freedom, the multivariate approach, and a mixed-model (multilevel) approach (SAS PROC MIXED) with Kenward–Roger’s adjusted degrees of freedom. We simulated multivariate normal and nonnormal distributions with varied population variance–covariance structures (spherical and nonspherical), sample sizes (N), and numbers of factor levels (K). For normally distributed data, as expected, the univariate approach with Huynh–Feldt correction controlled the Type I error rate with only very few exceptions, even if samples sizes as low as three were combined with high numbers of factor levels. The multivariate approach also controlled the Type I error rate, but it requires N ≥ K. PROC MIXED often showed acceptable control of the Type I error rate for normal data, but it also produced several liberal or conservative results. For nonnormal data, all of the procedures showed clear deviations from the nominal Type I error rate in many conditions, even for sample sizes greater than 50. Thus, none of these approaches can be considered robust if the response variable is nonnormally distributed. The results indicate that both the variance heterogeneity and covariance heterogeneity of the population covariance matrices affect the error rates.

[1]  P. Games Data transformations, power, and skew: a rebuttal to levine and Dunlap , 1984 .

[2]  E. Ziegel Introduction to Robust Estimation and Hypothesis Testing (2nd ed.) , 2005 .

[3]  Todd C. Headrick Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions , 2002 .

[4]  Roser Bono,et al.  Analyzing Small Samples of Repeated Measures Data with the Mixed-Model Adjusted F Test , 2009, Commun. Stat. Simul. Comput..

[5]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[6]  L. Skovgaard NONLINEAR MODELS FOR REPEATED MEASUREMENT DATA. , 1996 .

[7]  THE ANALYSIS OF REPEATED MEASUREMENTS : UNIVARIATE TESTS, MULTIVARIATE TESTS, OR BOTH ? , 1995 .

[8]  H J Keselman,et al.  Testing treatment effects in repeated measures designs: trimmed means and bootstrapping. , 2000, The British journal of mathematical and statistical psychology.

[9]  T. Loughin SAS® for Mixed Models, 2nd edition Edited by Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D., and Schabenberger, O. , 2006 .

[10]  Allen I. Fleishman A method for simulating non-normal distributions , 1978 .

[11]  S. Maxwell,et al.  A Monte Carlo Comparison of Seven ε-Adjustment Procedures in Repeated Measures Designs with Small Sample Sizes@@@A Monte Carlo Comparison of Seven e-Adjustment Procedures in Repeated Measures Designs with Small Sample Sizes , 1994 .

[12]  Peter C Austin,et al.  Estimating Multilevel Logistic Regression Models When the Number of Clusters is Low: A Comparison of Different Statistical Software Procedures , 2010, The international journal of biostatistics.

[13]  F B Hu,et al.  Comparison of population-averaged and subject-specific approaches for analyzing repeated binary outcomes. , 1998, American journal of epidemiology.

[14]  Marie Davidian,et al.  Nonlinear models for repeated measurement data: An overview and update , 2003 .

[15]  Scott E. Maxwell,et al.  Designing Experiments and Analyzing Data: A Model Comparison Perspective , 1990 .

[16]  H. Rouanet,et al.  COMPARISON BETWEEN TREATMENTS IN A REPEATED‐MEASUREMENT DESIGN: ANOVA AND MULTIVARIATE METHODS , 1970 .

[17]  J. Duncan,et al.  Visual search and stimulus similarity. , 1989, Psychological review.

[18]  M. Lindstrom,et al.  A survey of methods for analyzing clustered binary response data , 1996 .

[19]  J P Hatch,et al.  A test for serial correlation in univariate repeated-measures analysis. , 1983, Biometrics.

[20]  A. Hamerle,et al.  A comparison of different methods for the estimation of regression models with correlated binary responses , 2000 .

[21]  E. Vonesh,et al.  Linear and Nonlinear Models for the Analysis of Repeated Measurements , 1996 .

[22]  H. Keselman,et al.  Consequences of Assumption Violations Revisited: A Quantitative Review of Alternatives to the One-Way Analysis of Variance F Test , 1996 .

[23]  Michael G Kenward,et al.  The analysis of very small samples of repeated measurements I: An adjusted sandwich estimator , 2010, Statistics in medicine.

[24]  H. Keselman,et al.  The analysis of repeated measures designs: a review. , 2001, The British journal of mathematical and statistical psychology.

[25]  H. Keselman,et al.  A comparison of recent approaches to the analysis of repeated measurements , 1999 .

[26]  R C Littell,et al.  Mixed Models: Modelling Covariance Structure in the Analysis of Repeated Measures Data , 2005 .

[27]  Kern W. Dickman,et al.  Sample and population score matrices and sample correlation matrices from an arbitrary population correlation matrix , 1962 .

[28]  Edgar Brunner,et al.  Rank-Score Tests in Factorial Designs with Repeated Measures , 1999 .

[29]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[30]  H. Keselman,et al.  Detecting repeated measures effects with univariate and multivariate statistics , 1997 .

[31]  D J Mayer,et al.  Reducing the risk of corneal graft rejection. A comparison of different methods. , 1987, Cornea.

[32]  C. D. Vale,et al.  Simulating multivariate nonnormal distributions , 1983 .

[33]  N. Meiran Reconfiguration of processing mode prior to task performance. , 1996 .

[34]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[35]  S Buus,et al.  Temporal integration of loudness as a function of level. , 1995, The Journal of the Acoustical Society of America.

[36]  K. Muller,et al.  Statistical tests with accurate size and power for balanced linear mixed models , 2007, Statistics in medicine.

[37]  H. Hotelling The Generalization of Student’s Ratio , 1931 .

[38]  Guillermo Vallejo Seco,et al.  A Comparison of the Bootstrap-F, Improved General Approximation, and Brown-Forsythe Multivariate Approaches in a Mixed Repeated Measures Design , 2006 .

[39]  S. Geisser,et al.  On methods in the analysis of profile data , 1959 .

[40]  Todd C. Headrick,et al.  Numerical Computing and Graphics for the Power Method Transformation Using Mathematica , 2007 .

[41]  Scott E. Maxwell,et al.  A Monte Carlo Comparison of Seven ε-Adjustment Procedures in Repeated Measures Designs With Small Sample Sizes , 1994 .

[42]  Gilbert W. Fellingham,et al.  Performance of the Kenward–Roger Method when the Covariance Structure is Selected Using AIC and BIC , 2005 .

[43]  T. C. Oshima,et al.  Type I error rates for Huynh's general approximation and improved general approximation tests , 1994 .

[44]  Denis Cousineau,et al.  QMPE: Estimating Lognormal, Wald, and Weibull RT distributions with a parameter-dependent lower bound , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[45]  W. Dunlap,et al.  Data transformation, power, and skew: A rejoinder to Games. , 1983 .

[46]  Jorge L. Mendoza,et al.  Testing the validity conditions of repeated measures F tests. , 1980 .

[47]  Jeremy M Wolfe,et al.  What are the shapes of response time distributions in visual search? , 2011, Journal of experimental psychology. Human perception and performance.

[48]  Russell D. Wolfinger,et al.  The analysis of repeated measurements: a comparison of mixed-model satterthwaite f tests and a nonpooled adjusted degrees of freedom multivariate test , 1999 .

[49]  P. Potvin,et al.  Statistical power for the two-factor repeated measures ANOVA , 2000, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[50]  Todd C. Headrick,et al.  Parametric Probability Densities and Distribution Functions for Tukey g -and- h Transformations and their Use for Fitting Data , 2008 .

[51]  P. Games Curvilinear transformations of the dependent variable. , 1983 .

[52]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[53]  H. Keselman,et al.  An examination of the robustness of the empirical Bayes and other approaches for testing main and interaction effects in repeated measures designs. , 2000, The British journal of mathematical and statistical psychology.

[54]  G. Box Some Theorems on Quadratic Forms Applied in the Study of Analysis of Variance Problems, I. Effect of Inequality of Variance in the One-Way Classification , 1954 .

[55]  James Algina,et al.  Type I Error Rates For A One Factor Within-Subjects Design With Missing Values. , 2004, Journal of modern applied statistical methods : JMASM.

[56]  M. Kenward,et al.  Small sample inference for fixed effects from restricted maximum likelihood. , 1997, Biometrics.

[57]  Pablo Livacic-Rojas,et al.  Comparison of Two Procedures for Analyzing Small Sets of Repeated Measures Data , 2005, Multivariate behavioral research.

[58]  T. Jaeger,et al.  Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. , 2008, Journal of memory and language.

[59]  Julia Kastner,et al.  Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[60]  R. Näätänen,et al.  Auditory frequency discrimination and event-related potentials. , 1985, Electroencephalography and clinical neurophysiology.

[61]  H. Huynh,et al.  Estimation of the Box Correction for Degrees of Freedom from Sample Data in Randomized Block and Split-Plot Designs , 1976 .

[62]  J. H. Schuenemeyer,et al.  Generalized Linear Models (2nd ed.) , 1992 .

[63]  J. Deddens,et al.  Analysis of lognormally distributed exposure data with repeated measures and values below the limit of detection using SAS. , 2011, The Annals of occupational hygiene.

[64]  D. Rasch,et al.  Zur Legende der Voraussetzungen des t-Tests für unabhängige Stichproben , 2009 .

[65]  M. Ernst,et al.  Humans integrate visual and haptic information in a statistically optimal fashion , 2002, Nature.

[66]  Todd C. Headrick,et al.  The power method transformation: its probability density function, distribution function, and its further use for fitting data , 2007 .

[67]  William P. Dunlap,et al.  Power of the F test with skewed data: Should one transform or not? , 1982 .

[68]  J. L. Rasmussen,et al.  Data transformation, Type I error rate and power , 1989 .

[69]  C. Eriksen,et al.  Effects of noise letters upon the identification of a target letter in a nonsearch task , 1974 .

[70]  Colin M. Macleod Half a century of research on the Stroop effect: an integrative review. , 1991, Psychological bulletin.

[71]  Oliver Kuss,et al.  How to use SAS ® for Logistic Regression with Correlated Data , 2002 .

[72]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[73]  R. A. Bailey Designing Experiments and Analyzing Data: a Model Comparison Perspective, 2nd edn , 2005 .

[74]  R. Blair,et al.  A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. , 1992 .

[75]  Michael R. Harwell,et al.  Summarizing Monte Carlo Results in Methodological Research: The One- and Two-Factor Fixed Effects ANOVA Cases , 1992 .

[76]  H. Akaike A new look at the statistical model identification , 1974 .

[77]  Rachel T. Fouladi,et al.  A Comparison of Two General Approaches to Mixed Model Longitudinal Analyses Under Small Sample Size Conditions , 2004 .

[78]  H. Huynh,et al.  Conditions under Which Mean Square Ratios in Repeated Measurements Designs Have Exact F-Distributions , 1970 .

[79]  A Power Comparison of the Welch-James and Improved General Approximation Tests in the Split-Plot Design , 1998 .

[80]  B. J. Winer Statistical Principles in Experimental Design , 1992 .

[81]  Effects of truncation on reaction time , 1994 .

[82]  Emanuel Schmider,et al.  Is It Really Robust , 2010 .

[83]  L. L. Cam,et al.  The Central Limit Theorem Around 1935 , 1986 .

[84]  W. Hays Statistics, 4th ed. , 1988 .

[85]  G. Glass,et al.  Consequences of Failure to Meet Assumptions Underlying the Fixed Effects Analyses of Variance and Covariance , 1972 .

[86]  J. Mauchly Significance Test for Sphericity of a Normal $n$-Variate Distribution , 1940 .

[87]  Ilona Berkovits,et al.  Bootstrap Resampling Approaches for Repeated Measure Designs: Relative Robustness to Sphericity and Normality Violations , 2000 .

[88]  Anne Lohrli Chapman and Hall , 1985 .

[89]  R. Ulrich,et al.  Effects of truncation on reaction time analysis. , 1994, Journal of experimental psychology. General.

[90]  S. Lipsitz,et al.  Analysis of repeated categorical data using generalized estimating equations. , 1994, Statistics in medicine.

[91]  Garrett K. Mandeville,et al.  Validity conditions in repeated measures designs. , 1979 .

[92]  S. Geisser,et al.  An Extension of Box's Results on the Use of the $F$ Distribution in Multivariate Analysis , 1958 .

[93]  James Algina,et al.  A Comparison of Data Analysis Strategies for Testing Omnibus Effects in Higher-Order Repeated Measures Designs , 2002, Multivariate behavioral research.

[94]  L. T. DeCarlo On the meaning and use of kurtosis. , 1997 .

[95]  Edgar Brunner,et al.  Nonparametric methods in factorial designs , 2001 .

[96]  I. Johnsrude,et al.  The problem of functional localization in the human brain , 2002, Nature Reviews Neuroscience.

[97]  Ewa M. Bielihska,et al.  COMPARISON OF DIFFERENT METHODS , 1994 .

[98]  B. Lecoutre A Correction for the ε̃ Approximate Test in Repeated Measures Designs With Two or More Independent Groups , 1991 .

[99]  G. Box Some Theorems on Quadratic Forms Applied in the Study of Analysis of Variance Problems, II. Effects of Inequality of Variance and of Correlation Between Errors in the Two-Way Classification , 1954 .

[100]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[101]  Michael G. Kenward,et al.  An improved approximation to the precision of fixed effects from restricted maximum likelihood , 2009, Comput. Stat. Data Anal..

[102]  R. Jennrich,et al.  Unbalanced repeated-measures models with structured covariance matrices. , 1986, Biometrics.

[103]  R. Wolfinger Heterogeneous Variance-Covariance Structures for Repeated Measures , 1996 .

[104]  J. Kalbfleisch,et al.  A Comparison of Cluster-Specific and Population-Averaged Approaches for Analyzing Correlated Binary Data , 1991 .

[105]  Youngjo Lee,et al.  Modelling and analysing correlated non-normal data , 2001 .

[106]  R. Duncan Luce,et al.  Response Times: Their Role in Inferring Elementary Mental Organization , 1986 .

[107]  H. Keselman,et al.  Repeated measures ANOVA: some new results on comparing trimmed means and means. , 2000, The British journal of mathematical and statistical psychology.

[108]  T. Zandt,et al.  How to fit a response time distribution , 2000, Psychonomic bulletin & review.

[109]  Russell D. Wolfinger,et al.  The Analysis of Repeated Measurements with Mixed-Model Adjusted F Tests , 2004 .

[110]  G. B. Schaalje,et al.  Adequacy of approximations to distributions of test statistics in complex mixed linear models , 2002 .

[111]  T. Micceri The unicorn, the normal curve, and other improbable creatures. , 1989 .

[112]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[113]  Lisa M. Lix,et al.  Testing Repeated Measures Hypotheses When Covariance Matrices are Heterogeneous , 1993 .

[114]  H. Keselman,et al.  Repeated measures one-way ANOVA based on a modified one-step M-estimator. , 2003, The British journal of mathematical and statistical psychology.