A simulation study of the strength of evidence in the recommendation of medications based on two trials with statistically significant results

A typical rule that has been used for the endorsement of new medications by the Food and Drug Administration is to have two trials, each convincing on its own, demonstrating effectiveness. “Convincing” may be subjectively interpreted, but the use of p-values and the focus on statistical significance (in particular with p < .05 being coined significant) is pervasive in clinical research. Therefore, in this paper, we calculate with simulations what it means to have exactly two trials, each with p < .05, in terms of the actual strength of evidence quantified by Bayes factors. Our results show that different cases where two trials have a p-value below .05 have wildly differing Bayes factors. Bayes factors of at least 20 in favor of the alternative hypothesis are not necessarily achieved and they fail to be reached in a large proportion of cases, in particular when the true effect size is small (0.2 standard deviations) or zero. In a non-trivial number of cases, evidence actually points to the null hypothesis, in particular when the true effect size is zero, when the number of trials is large, and when the number of participants in both groups is low. We recommend use of Bayes factors as a routine tool to assess endorsement of new medications, because Bayes factors consistently quantify strength of evidence. Use of p-values may lead to paradoxical and spurious decision-making regarding the use of new medications.

[1]  Scott D. Brown,et al.  A simple introduction to Markov Chain Monte–Carlo sampling , 2016, Psychonomic bulletin & review.

[2]  D. Ravenzwaaij,et al.  A Simulation Study of the Strength of Evidence in the Endorsement of Medications Based on Two Trials with Statistically Significant Results , 2017 .

[3]  E. Wagenmakers,et al.  Toward evidence‐based medical statistics: a Bayesian analysis of double‐blind placebo‐controlled antidepressant trials in the treatment of anxiety disorders , 2016, International journal of methods in psychiatric research.

[4]  S. Goodman,et al.  Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations , 2016, European Journal of Epidemiology.

[5]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[6]  J. Ioannidis,et al.  Evolution of Reporting P Values in the Biomedical Literature, 1990-2015. , 2016, JAMA.

[7]  Guochen Song,et al.  Bayesian methods for the design and analysis of noninferiority trials , 2016, Journal of biopharmaceutical statistics.

[8]  Ram C Tiwari,et al.  Bayesian approach to non-inferiority trials for normal means , 2016, Statistical methods in medical research.

[9]  Olga V. Marchenko,et al.  Trends and innovations in clinical trial statistics , 2016, Journal of biopharmaceutical statistics.

[10]  Jeffrey N. Rouder,et al.  Robust misinterpretation of confidence intervals , 2013, Psychonomic bulletin & review.

[11]  B. Zaslavsky Bayesian Hypothesis Testing in Two‐Arm Trials with Dichotomous Outcomes , 2013, Biometrics.

[12]  Meinhard Kieser,et al.  Quality of reporting of clinical non-inferiority and equivalence randomised trials - update and extension , 2012, Trials.

[13]  John P A Ioannidis,et al.  Empirical evaluation of very large treatment effects of medical interventions. , 2012, JAMA.

[14]  Gail M. Sullivan,et al.  Using Effect Size-or Why the P Value Is Not Enough. , 2012, Journal of graduate medical education.

[15]  John P A Ioannidis,et al.  Improving Validation Practices in “Omics” Research , 2011, Science.

[16]  Gregory Campbell,et al.  Bayesian Statistics in Medical Devices: Innovation Sparked by the FDA , 2011, Journal of biopharmaceutical statistics.

[17]  John A Scott,et al.  BayesWeb: A User-Friendly Platform for Exploratory Bayesian Analysis of Safety Signals from Small Clinical Trials , 2011, Journal of biopharmaceutical statistics.

[18]  R. Katz FDA: Evidentiary standards for drug development and approval , 2004, NeuroRX.

[19]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[20]  Gordon H Guyatt,et al.  GrADe : what is “ quality of evidence ” and why is it important to clinicians ? rATING quALITY of evIDeNCe AND STreNGTH of reCommeNDATIoNS , 2022 .

[21]  Gene Pennello,et al.  Experience with Reviewing Bayesian Medical Device Trials , 2007, Journal of biopharmaceutical statistics.

[22]  Bradley P Carlin,et al.  Practical Bayesian Design and Analysis for Drug and Device Clinical Trials , 2007, Journal of biopharmaceutical statistics.

[23]  E. Wagenmakers A practical solution to the pervasive problems ofp values , 2007, Psychonomic bulletin & review.

[24]  G. Guyatt,et al.  Ethical Issues in Stopping Randomized Trials Early Because of Apparent Benefit , 2007, Annals of Internal Medicine.

[25]  Tony O’Hagan Bayes factors , 2006 .

[26]  J. Ioannidis Why Most Published Research Findings Are False , 2005 .

[27]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[28]  D. Lindley Kendall's Advanced Theory of Statistics, volume 2B, Bayesian Inference, 2nd edn , 2005 .

[29]  G. Gigerenzer Mindless statistics , 2004 .

[30]  Steven Goodman Toward Evidence-Based Medical Statistics. 2: The Bayes Factor , 1999, Annals of Internal Medicine.

[31]  S. Goodman Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy , 1999, Annals of Internal Medicine.

[32]  Guidance for Industry Providing Clinical Evidence of Effectiveness for Human Drug and Biological Products , 1998 .

[33]  Dani Gamerman,et al.  Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference , 1997 .

[34]  Anthony O'Hagan,et al.  Kendall's Advanced Theory of Statistics: Vol. 2B, Bayesian Inference. , 1996 .

[35]  S. Goodman,et al.  p values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. , 1993, American journal of epidemiology.

[36]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[37]  H. Jeffreys The Theory of Probability , 1896 .