Robust analysis of the central tendency, simple and multiple regression and ANOVA: a step by step tutorial.

After much exertion and care to run an experiment in social science, the analysis of data should not be ruined by an improper analysis. Often, classical methods, like the mean, the usual simple and multiple linear regressions, and the ANOVA require normality and absence of outliers, which rarely occurs in data coming from experiments. To palliate to this problem, researchers often use some ad-hoc methods like the detection and deletion of outliers. In this tutorial, we will show the shortcomings of such an approach. In particular, we will show that outliers can sometimes be very difficult to detect and that the full inferential procedure is somewhat distorted by such a procedure. A more appropriate and modern approach is to use a robust procedure that provides estimation, inference and testing that are not influenced by outlying observations but describes correctly the structure for the bulk of the data. It can also give diagnostic of the distance of any point or subject relative to the central tendency. Robust procedures can also be viewed as methods to check the appropriateness of the classical methods. To provide a step-by-step tutorial, we present descriptive analyses that allow researchers to make an initial check on the conditions of application of the data. Next, we compare classical and robust alternatives to ANOVA and regression and discuss their advantages and disadvantages. Finally, we present indices and plots that are based on the residuals of the analysis and can be used to determine if the conditions of applications of the analyses are respected. Examples on data from psychological research illustrate each of these points and for each analysis and plot, R code is provided to allow the readers to apply the techniques presented throughout the article.

[1]  A. Hossain,et al.  A comparative study on detection of influential observations in linear regression , 1991 .

[2]  J. Etter,et al.  Comparing the predictive validity of five cigarette dependence questionnaires. , 2010, Drug and alcohol dependence.

[3]  R. Cook Detection of influential observation in linear regression , 2000 .

[4]  E. Ziegel Introduction to Robust Estimation and Hypothesis Testing (2nd ed.) , 2005 .

[5]  Rupert G. Miller Beyond ANOVA, basics of applied statistics , 1987 .

[6]  V. Yohai HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION , 1987 .

[7]  T. Micceri The unicorn, the normal curve, and other improbable creatures. , 1989 .

[8]  R. Wilcox Fundamentals of Modern Statistical Methods: Substantially Improving Power and Accuracy , 2001 .

[9]  M. Victoria-Feser,et al.  A Robust Coefficient of Determination for Regression , 2010 .

[10]  R. Ratcliff Methods for dealing with reaction time outliers. , 1993, Psychological bulletin.

[11]  Clemens Reimann,et al.  Statistical data analysis explained : applied environmental statics with R , 2008 .

[12]  J. Brian Gray,et al.  Applied Regression Including Computing and Graphics , 1999, Technometrics.

[13]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[14]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[15]  Megan E. Piper,et al.  Time to first cigarette in the morning as an index of ability to quit smoking: implications for nicotine dependence. , 2007, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[16]  Rand R. Wilcox,et al.  Fundamentals of Modern Statistical Methods , 2001 .

[17]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[18]  Stephane Heritier,et al.  Robust Methods in Biostatistics , 2009 .

[19]  R. Wilcox Introduction to Robust Estimation and Hypothesis Testing , 1997 .

[20]  H. Keselman,et al.  Modern robust data analysis methods: measures of central tendency. , 2003, Psychological methods.

[21]  B. Yazici,et al.  A comparison of various tests of normality , 2007 .

[22]  Jean-François Etter,et al.  Using item response theory to study the convergent and discriminant validity of three questionnaires measuring cigarette dependence. , 2008, Psychology of addictive behaviors : journal of the Society of Psychologists in Addictive Behaviors.

[23]  M. Perea Tiempos de reacción y psicología cognitiva: Dos procedimientos para evitar el sesgo debido al tamaño muestral , 1999 .

[24]  G. Cumming,et al.  Statistical Reform in Psychology , 2007, Psychological science.

[25]  M. E. Johnson,et al.  A Comparative Study of Tests for Homogeneity of Variances, with Applications to the Outer Continental Shelf Bidding Data , 1981 .