Error Rates, Decisive Outcomes and Publication Bias with Several Inferential Methods

BackgroundStatistical methods for inferring the true magnitude of an effect from a sample should have acceptable error rates when the true effect is trivial (type I rates) or substantial (type II rates).ObjectiveThe objective of this study was to quantify the error rates, rates of decisive (publishable) outcomes and publication bias of five inferential methods commonly used in sports medicine and science. The methods were conventional null-hypothesis significance testing [NHST] (significant and non-significant imply substantial and trivial true effects, respectively); conservative NHST (the observed magnitude is interpreted as the true magnitude only for significant effects); non-clinical magnitude-based inference [MBI] (the true magnitude is interpreted as the magnitude range of the 90 % confidence interval only for intervals not spanning substantial values of the opposite sign); clinical MBI (a possibly beneficial effect is recommended for implementation only if it is most unlikely to be harmful); and odds-ratio clinical MBI (implementation is also recommended when the odds of benefit outweigh the odds of harm, with an odds ratio >66).MethodsSimulation was used to quantify standardized mean effects in 500,000 randomized, controlled trials each for true standardized magnitudes ranging from null through marginally moderate with three sample sizes: suboptimal (10 + 10), optimal for MBI (50 + 50) and optimal for NHST (144 + 144).ResultsType I rates for non-clinical MBI were always lower than for NHST. When type I rates for clinical MBI were higher, most errors were debatable, given the probabilistic qualification of those inferences (unlikely or possibly beneficial). NHST often had unacceptable rates for either type II errors or decisive outcomes, and it had substantial publication bias with the smallest sample size, whereas MBI had no such problems.ConclusionMBI is a trustworthy, nuanced alternative to NHST, which it outperforms in terms of the sample size, error rates, decision rates and publication bias.

[1]  Oliver Kuss,et al.  The ongoing tyranny of statistical significance testing in biomedical research , 2010, European Journal of Epidemiology.

[2]  Jesper W. Schneider Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations , 2014, Scientometrics.

[3]  John Simes,et al.  Improving interpretation of clinical studies by use of confidence levels, clinical significance curves, and risk-benefit contours , 2001, The Lancet.

[4]  Gerd Gigerenzer,et al.  Surrogate Science , 2015 .

[5]  M D Hughes,et al.  Reporting Bayesian analyses of clinical trials. , 1993, Statistics in medicine.

[6]  Greg Atkinson,et al.  Estimating Sample Size for Magnitude-Based Inferences , 2006 .

[7]  Alan H. Welsh,et al.  “Magnitude-based Inference”: A Statistical Review , 2015, Medicine and science in sports and exercise.

[8]  D. Curran‐Everett,et al.  The fickle P value generates irreproducible results , 2015, Nature Methods.

[9]  Rae Woong Park Bayesian Approaches to Clinical Trials and Health-Care Evaluation (Statics in Practice)(2004), David J. Spiegelhalter et al., John Wiley and Sons. , 2006 .

[10]  S. Marshall,et al.  Progressive statistics for studies in sports medicine and exercise science. , 2009, Medicine and science in sports and exercise.

[11]  Alan M Batterham,et al.  The case for magnitude-based inference. , 2015, Medicine and science in sports and exercise.

[12]  G. Cumming,et al.  The New Statistics , 2014, Psychological science.

[13]  Betsy Jane Becker,et al.  Synthesizing standardized mean‐change measures , 1988 .

[14]  P. Armitage,et al.  Statistical methods in medical research. , 1972 .

[15]  M. A. Best Bayesian Approaches to Clinical Trials and Health‐Care Evaluation , 2005 .

[16]  G. Gigerenzer Mindless statistics , 2004 .

[17]  R. P. Carver The Case Against Statistical Significance Testing , 1978 .

[18]  P R Burton,et al.  Helping doctors to draw appropriate inferences from the analysis of medical studies. , 1994, Statistics in medicine.

[19]  L C Gurrin,et al.  Bayesian statistics in medical research: an intuitive alternative to conventional data analysis. , 2000, Journal of evaluation in clinical practice.

[20]  Jacob Cohen The earth is round (p < .05) , 1994 .

[21]  Will G Hopkins,et al.  Estimating Sample Size for Magnitude-Based Inferences , 2018 .

[22]  K. George,et al.  So what does this all mean? , 2015, Physical Therapy in Sport.

[23]  Will G. Hopkins,et al.  A spreadsheet for deriving a confidence interval, mechanistic inference and clinical inference from a P value , 2007 .

[24]  D. Mccloskey,et al.  The Cult of Statistical Significance , 2009 .

[25]  Regina Nuzzo,et al.  Scientific method: Statistical errors , 2014, Nature.

[26]  Sally Hopewell,et al.  Publication bias in clinical trials due to statistical significance or direction of trial results. , 2009, The Cochrane database of systematic reviews.

[27]  Jose D. Perezgonzalez,et al.  P-values as percentiles. Commentary on: “Null hypothesis significance tests. A mix–up of two different theories: the basis for widespread confusion and numerous misinterpretations” , 2015, Front. Psychol..

[28]  M J Campbell,et al.  Clinical significance not statistical significance: a simple Bayesian alternative to p values. , 1998, Journal of epidemiology and community health.

[29]  Alan M Batterham,et al.  Making meaningful inferences about magnitudes. , 2006, International journal of sports physiology and performance.

[30]  P. Fayers,et al.  Assessing methods to specify the target difference for a randomised controlled trial: DELTA (Difference ELicitation in TriAls) review. , 2014, Health technology assessment.

[31]  Matthew R. Schofield,et al.  Inference about magnitudes of effects. , 2008, International journal of sports physiology and performance.