How to Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis on Pre-Registration

Researchers often lack knowledge about how to deal with outliers when analyzing their data. Even more frequently, researchers do not pre-specify how they plan to manage outliers. In this paper we aim to improve research practices by outlining what you need to know about outliers. We start by providing a functional definition of outliers. We then lay down an appropriate nomenclature/classification of outliers. This nomenclature is used to understand what kinds of outliers can be encountered and serves as a guideline to make appropriate decisions regarding the conservation, deletion, or recoding of outliers. These decisions might impact the validity of statistical inferences as well as the reproducibility of our experiments. To be able to make informed decisions about outliers you first need proper detection tools. We remind readers why the most common outlier detection methods are problematic and recommend the use of the median absolute deviation to detect univariate outliers, and of the Mahalanobis-MCD distance to detect multivariate outliers. An R package was created that can be used to easily perform these detection tests. Finally, we promote the use of pre-registration to avoid flexibility in data analysis when handling outliers.

[1]  Michael C. Frank,et al.  A practical guide for transparency in psychological science , 2018 .

[2]  Brian A. Nosek,et al.  The preregistration revolution , 2018, Proceedings of the National Academy of Sciences.

[3]  M. McAleer,et al.  Big Data, Computational Science, Economics, Finance, Marketing, Management, and Psychology: Connections , 2018 .

[4]  Olivier Klein,et al.  Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance , 2018 .

[5]  T. Yarkoni,et al.  Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning , 2017, Perspectives on psychological science : a journal of the Association for Psychological Science.

[6]  Robbie C. M. van Aert,et al.  Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking , 2016, Front. Psychol..

[7]  R. Giner-Sorolla,et al.  Pre-registration in social psychology—A discussion and suggested template , 2016 .

[8]  Francis Tuerlinckx,et al.  Increasing Transparency Through a Multiverse Analysis , 2016, Perspectives on psychological science : a journal of the Association for Psychological Science.

[9]  Stefano Tamburin,et al.  Psychological Considerations in the Assessment and Treatment of Pain in Neurorehabilitation and Psychological Factors Predictive of Therapeutic Response: Evidence and Recommendations from the Italian Consensus Conference on Pain in Neurorehabilitation , 2016, Front. Psychol..

[10]  Diana Adler,et al.  Using Multivariate Statistics , 2016 .

[11]  J. Tukey,et al.  LESS VULNERABLE CONFIDENCE AND SIGNIFICANCE PROCEDURES FOR LOCATION BASED ON A SINGLE SAMPLE : TRIMMING/WINSORIZATION 1 , 2016 .

[12]  J. Wicherts,et al.  Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: the power of alternatives and recommendations. , 2014, Psychological methods.

[13]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .

[14]  Herman Aguinis,et al.  Best-Practice Recommendations for Defining, Identifying, and Handling Outliers , 2013 .

[15]  G. Loewenstein,et al.  Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling , 2012, Psychological science.

[16]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[17]  Christophe Leys,et al.  A nonparametric method to analyze interactions: The adjusted rank transform test , 2010 .

[18]  Denis Cousineau,et al.  Outliers detection and treatment: a review , 2010 .

[19]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[20]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[21]  Gary H. McClelland,et al.  Nasty data: Unruly, ill-mannered observations can ruin your analysis. , 2000 .

[22]  Rex B. Kline,et al.  Principles and Practice of Structural Equation Modeling , 1998 .

[23]  W. Mcguire Creative hypothesis generating in psychology: some useful heuristics. , 1997, Annual review of psychology.

[24]  R. Abelson Statistics As Principled Argument , 1995 .

[25]  G. Moddeman,et al.  Unraveling the Mystery of Health , 1995 .

[26]  D. C. Howell Statistical Methods for Psychology , 1987 .

[27]  A. Antonovsky Unraveling the mystery of health: how people manage stress and stay well , 1987 .

[28]  P. Hall On the Bootstrap and Confidence Intervals , 1986 .

[29]  L. Covi,et al.  The Hopkins Symptom Checklist (HSCL): a self-report symptom inventory. , 1974, Behavioral science.