Fragility Index, power, strength and robustness of findings in sports medicine and arthroscopic surgery: a secondary analysis of data from a study on use of the Fragility Index in sports surgery

Background A recent study concluded that most findings reported as significant in sports medicine and arthroscopic surgery are not “robust” when evaluated with the Fragility Index (FI). A secondary analysis of data from a previous study was performed to investigate (1) the correctness of the findings, (2) the association between FI, p-value and post hoc power, (3) median power to detect a medium effect size, and (4) the implementation of sample size analysis in these randomized controlled trials (RCTs). Methods In addition to the 48 studies listed in the appendix accompanying the original study by Khan et al. (2017) we did a follow-up literature search and 18 additional studies were found. In total 66 studies were included in the analysis. We calculated post hoc power, p-values and confidence intervals associated with the main outcome variable. Use of a priori power analysis was recorded. The median power to detect small (h > 0.2), medium (h > 0.5), or large effect (h > 0.8) with a baseline proportion of events of 10% and 30% in each study included was calculated. Three simulation data sets were used to validate our findings. Results Inconsistencies were found in eight studies. A priori power analysis was missing in one-fourth of studies (16/66). The median power to detect a medium effect size with a baseline proportion of events of 10% and 30% was 42% and 43%, respectively. The FI was inherently associated with the achieved p-value and post hoc power. Discussion A relatively high proportion of studies had inconsistencies. The FI is a surrogate measure for p-value and post hoc power. Based on these studies, the median power in this field of research is suboptimal. There is an urgent need to investigate how well research claims in orthopedics hold in a replicated setting and the validity of research findings.

[1]  Carla H. Lagorio,et al.  Psychology , 1929, Nature.

[2]  D G Altman,et al.  The scandal of poor medical research , 1994, BMJ.

[3]  S. Goodman Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy , 1999, Annals of Internal Medicine.

[4]  D. Heisey,et al.  The Abuse of Power , 2001 .

[5]  J. Bernstein,et al.  Sample size and statistical power of randomised, controlled trials in orthopaedics. , 2001, The Journal of bone and joint surgery. British volume.

[6]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[7]  Daniel James O'Keefe,et al.  Brief Report: Post Hoc Power, Observed Power, A Priori Power, Retrospective Power, Prospective Power, Achieved Power: Sorting Out Appropriate Uses of Statistical Power Analyses , 2007 .

[8]  J. Ioannidis Why Most Discovered True Associations Are Inflated , 2008, Epidemiology.

[9]  G. Cumming Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better , 2008, Perspectives on psychological science : a journal of the Association for Psychological Science.

[10]  L. Feldman,et al.  Evaluation and stages of surgical innovations , 2009, The Lancet.

[11]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[12]  Andrew Burke,et al.  The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index. , 2014, Journal of clinical epidemiology.

[13]  Brian A. Nosek,et al.  An open investigation of the reproducibility of cancer biology research , 2014, eLife.

[14]  David Colquhoun,et al.  An investigation of the false discovery rate and the misinterpretation of p-values , 2014, Royal Society Open Science.

[15]  Daniel E. Davis,et al.  Is There Truly "No Significant Difference"? Underpowered Randomized Controlled Trials in the Orthopaedic Literature. , 2015, The Journal of bone and joint surgery. American volume.

[16]  D. Curran‐Everett,et al.  The fickle P value generates irreproducible results , 2015, Nature Methods.

[17]  Benedikt V. Ehinger,et al.  Faculty Opinions recommendation of PSYCHOLOGY. Estimating the reproducibility of psychological science. , 2015 .

[18]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[19]  John Carson Allen,et al.  P-Hacking in Orthopaedic Literature: A Twist to the Tail. , 2016, The Journal of bone and joint surgery. American volume.

[20]  John P. A. Ioannidis,et al.  What does research reproducibility mean? , 2016, Science Translational Medicine.

[21]  David Colquhoun,et al.  The reproducibility of research and the misinterpretation of p-values , 2017, bioRxiv.

[22]  Andrew Gelman,et al.  Measurement error and the replication crisis , 2017, Science.

[23]  M. Bhandari,et al.  The Fragility of Statistically Significant Findings From Randomized Trials in Sports Surgery: A Systematic Survey , 2017, The American journal of sports medicine.

[24]  J. Ioannidis,et al.  When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment , 2016, bioRxiv.

[25]  Brian A. Nosek,et al.  Making sense of replications , 2017, eLife.

[26]  John P. A. Ioannidis,et al.  A manifesto for reproducible science , 2017, Nature Human Behaviour.

[27]  J. Ioannidis,et al.  Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature , 2017, PLoS biology.

[28]  Rickey E Carter,et al.  The Fragility Index: a P-value in sheep’s clothing? , 2016, European heart journal.