Data Cleaning Basics: Best Practices in Dealing with Extreme Scores

Abstract In quantitative research, it is critical to perform data cleaning to ensure that the conclusions drawn from the data are as generalizable as possible, yet few researchers report doing so (Osborne JW. Educ Psychol. 2008;28:1-10). Extreme scores are a significant threat to the validity and generalizability of the results. In this article, I argue that researchers need to examine extreme scores to determine which of many possible causes contributed to the extreme score. From this, researchers can take appropriate action, which has many laudatory effects, from reducing error variance and improving the accuracy of parameter estimates to reducing the probability of errors of inference.

[1]  Sven Rabung,et al.  [How to deal with missing data?]. , 2010, Psychotherapie, Psychosomatik, medizinische Psychologie.

[2]  J. Osborne Best Practices in Quantitative Methods , 2009 .

[3]  Jason W. Osborne,et al.  Sweating the small stuff in educational psychology: how effect size and power reporting failed to change from 1969 to 1999, and what that means for the future of changing practices 1 , 2008 .

[4]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[5]  Ken Lane,et al.  What Is Robust Regression and How Do You Do It , 2002 .

[6]  Lillian Le-Cointe,et al.  Applied Statistics: A Handbook of Techniques (2nd ed.) , 1998 .

[7]  D. W. Zimmerman,et al.  Increasing the Power of Nonparametric Tests by Detecting and Downweighting Outliers , 1995 .

[8]  S. Rowland-Jones,et al.  HIV-specific cytotoxic T-cells in HIV-exposed but uninfected Gambian women , 1995, Nature Medicine.

[9]  D. W. Zimmerman A Note on the Influence of Outliers on Parametric and Nonparametric Tests , 1994 .

[10]  Bruce Thompson,et al.  Advances in Social Science Methodology , 1994 .

[11]  P. Jolicoeur,et al.  A Solution to the Effect of Sample Size on Outlier Elimination , 1994 .

[12]  Teri A. Crosby,et al.  How to Detect and Handle Outliers , 1993 .

[13]  Michele G. Jarrell,et al.  A Comparison of Two Procedures, the Mahalanobis Distance and the Andrews-Pregibon Statistic, for Identifying Multivariate Outliers. , 1992 .

[14]  Jeff Miller,et al.  Short Report: Reaction Time Analysis with Outlier Exclusion: Bias Varies with Sample Size , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[15]  J. L. Rasmussen,et al.  Evaluating Outlier Identification Tests: Mahalanobis D Squared and Comrey Dk. , 1988, Multivariate behavioral research.

[16]  J. Schmee Applied Statistics—A Handbook of Techniques , 1984 .

[17]  J. Stevens,et al.  Outliers and influential data points in regression analysis. , 1984 .

[18]  Steven J. Schwager,et al.  Detection of Multivariate Normal Outliers , 1982 .

[19]  V. Barnett,et al.  Outliers in Statistical Data , 1980 .

[20]  H. Wainer Robust Statistics: A Survey and Some Prescriptions , 1976 .

[21]  S. Huck,et al.  Some Comments Concerning the Use of Monotonic Transformations To Remove the Interaction in Two-Factor Anova's , 1975 .

[22]  F. J. Anscombe,et al.  Rejection of Outliers , 1960 .

[23]  W. J. Dixon,et al.  Analysis of Extreme Values , 1950 .

[24]  Jason W. Osborne,et al.  The power of outliers (and why researchers should ALWAYS check for them) , 2004 .

[25]  J. Osborne Notes on the use of data transformations. , 2002 .

[26]  Victoria P. Evans,et al.  Strategies for Detecting Outliers in Regression Analysis: An Introductory Primer. , 1999 .

[27]  D. W. Zimmerman,et al.  Invalidation of Parametric and Nonparametric Statistical Tests by Concurrent Violation of Two Assumptions , 1998 .

[28]  David N. Perkins,et al.  Introduction: New Conceptions of Thinking , 1993 .

[29]  G. Vining,et al.  Data Analysis: A Model-Comparison Approach , 1989 .

[30]  R. Kay,et al.  Applied Statistics. A Handbook of Techniques. 5th ed. , 1984 .

[31]  D. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[32]  Steven J. Schwager,et al.  Detection of Multivariate Outliers , 1979 .