Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings

I uncover previously underappreciated systematic sources of false and irreproducible results in natural, biomedical and social sciences that are rooted in statistical methodology. They include the inevitably occurring deviations from basic assumptions behind statistical analyses and the use of various approximations. I show through a number of examples that (a) arbitrarily small deviations from distributional homogeneity can lead to arbitrarily large deviations in the outcomes of statistical analyses; (b) samples of random size may violate the Law of Large Numbers and thus are generally unsuitable for conventional statistical inference; (c) the same is true, in particular, when random sample size and observations are stochastically dependent; and (d) the use of the Gaussian approximation based on the Central Limit Theorem has dramatic implications for p-values and statistical significance essentially making pursuit of small significance levels and p-values for a fixed sample size meaningless. The latter is proven rigorously in the case of one-sided Z test. This article could serve as a cautionary guidance to scientists and practitioners employing statistical methods in their work.

[1]  Steven N. Goodman,et al.  Why is Getting Rid of P-Values So Hard? Musings on Science and Statistics , 2019, The American Statistician.

[2]  A. Rényi On the central limit theorem for the sum of a random number of independent random variables , 1963 .

[3]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[4]  Random Sample Sizes: Limit Theorems and Characterizations , 1992 .

[5]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[6]  Stephan Morgenthaler,et al.  A survey of robust statistics , 2007, Stat. Methods Appl..

[7]  C. Esseen,et al.  A moment inequality with an application to the central limit theorem , 1956 .

[8]  L. Hanin,et al.  Suppression of Metastasis by Primary Tumor and Acceleration of Metastasis Following Primary Tumor Resection: A Natural Law? , 2018, Bulletin of mathematical biology.

[9]  S. Goodman A dirty dozen: twelve p-value misconceptions. , 2008, Seminars in hematology.

[10]  John P. A. Ioannidis,et al.  What does research reproducibility mean? , 2016, Science Translational Medicine.

[11]  Richard Horton,et al.  Offline: What is medicine's 5 sigma? , 2015, The Lancet.

[12]  Christopher D. Chambers,et al.  Redefine statistical significance , 2017, Nature Human Behaviour.

[13]  J. Ioannidis,et al.  Evolution of Reporting P Values in the Biomedical Literature, 1990-2015. , 2016, JAMA.

[14]  Leonid Hanin,et al.  Why statistical inference from clinical trials is likely to generate false and irreproducible results , 2017, BMC Medical Research Methodology.

[15]  S. Goodman,et al.  Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations , 2016, European Journal of Epidemiology.

[16]  W. Kendal,et al.  Chance mechanisms affecting the burden of metastases , 2005, BMC Cancer.

[17]  D G Altman,et al.  The scandal of poor medical research , 1994, BMJ.

[18]  Andrew Gelman,et al.  P values and statistical practice. , 2013, Epidemiology.

[19]  L. Kennedy-Shaffer,et al.  Before p < 0.05 to Beyond p < 0.05: Using History to Contextualize p-Values and Significance Testing , 2019, The American statistician.

[20]  S. Goodman Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy , 1999, Annals of Internal Medicine.

[21]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[22]  N. Lazar,et al.  Moving to a World Beyond “p < 0.05” , 2019, The American Statistician.

[23]  M. Branch,et al.  The “Reproducibility Crisis:” Might the Methods Used Frequently in Behavior-Analysis Research Help? , 2018, Perspectives on behavior science.

[24]  L. B. Klebanov,et al.  Pre-limit Theorems and Their Applications , 1999 .

[25]  R. Demicheli,et al.  Tumour dormancy: findings and hypotheses from clinical research on breast cancer. , 2001, Seminars in cancer biology.

[26]  J. Ioannidis Contradicted and initially stronger effects in highly cited clinical research. , 2005, JAMA.

[27]  A. C. Berry The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .

[28]  I. Tyurin Refinement of the upper bounds of the constants in Lyapunov's theorem , 2010 .

[29]  Felipe Romero Philosophy of Science and The Replicability Crisis , 2019, Philosophy Compass.

[30]  Marco Zaider,et al.  A stochastic model for the sizes of detectable metastases. , 2006, Journal of theoretical biology.

[31]  E. Ronchetti Small sample asymptotics: a review with applications to robust statistics , 1990 .

[32]  Thiago F. A. França,et al.  Reproducibility crisis in science or unrealistic expectations? , 2018, EMBO reports.