Size of Treatment Effects and Their Importance to Clinical Research and Practice

In randomized clinical trails (RCTs), effect sizes seen in earlier studies guide both the choice of the effect size that sets the appropriate threshold of clinical significance and the rationale to believe that the true effect size is above that threshold worth pursuing in an RCT. That threshold is used to determine the necessary sample size for the proposed RCT. Once the RCT is done, the data generated are used to estimate the true effect size and its confidence interval. Clinical significance is assessed by comparing the true effect size to the threshold effect size. In subsequent meta-analysis, this effect size is combined with others, ultimately to determine whether treatment (T) is clinically significantly better than control (C). Thus, effect sizes play an important role both in designing RCTs and in interpreting their results; but specifically which effect size? We review the principles of statistical significance, power, and meta-analysis, and commonly used effect sizes. The commonly used effect sizes are limited in conveying clinical significance. We recommend three equivalent effect sizes: number needed to treat, area under the receiver operating characteristic curve comparing T and C responses, and success rate difference, chosen specifically to convey clinical significance.

[1]  K. McGraw,et al.  A common language effect size statistic. , 1992 .

[2]  Alan E. Kazdin,et al.  Measuring the potency of risk factors for clinical or policy significance. , 1999 .

[3]  J. Fleiss On the asserted invariance of the odds ratio. , 1970, British journal of preventive & social medicine.

[4]  R. Rosenthal,et al.  Statistical versus traditional procedures for summarizing research findings. , 1980, Psychological bulletin.

[5]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[6]  Patrick E. Shrout,et al.  Should Significance Tests be Banned? Introduction to a Special Section Exploring the Pros and Cons , 1997 .

[7]  Raphael Gillett,et al.  Post hoc power analysis. , 1994 .

[8]  D. Rubin,et al.  Contrasts and Effect Sizes in Behavioral Research , 1999 .

[9]  L. Hedges,et al.  Statistical Methods for Meta-Analysis , 1987 .

[10]  David J. Kupfer,et al.  To Your Health: How to Understand What Research Tells Us about Risk , 2005 .

[11]  M. H. Ensom,et al.  Post Hoc Power Analysis: An Idea Whose Time Has Passed? , 2001, Pharmacotherapy.

[12]  Helena Chmura Kraemer,et al.  How Many Subjects? Statistical Power Analysis in Research , 1987 .

[13]  Jacob Cohen The earth is round (p < .05) , 1994 .

[14]  R. Rosenthal Parametric measures of effect size. , 1994 .

[15]  D. Sackett,et al.  The number needed to treat: a clinically useful measure of treatment effect , 1995, BMJ.

[16]  J. Raloff To your health?: Controversy surrounds whole-body scans—a costly screen for silent threats , 2003 .

[17]  D. Krantz The Null Hypothesis Testing Controversy in Psychology , 1999 .

[18]  R. Nickerson,et al.  Null hypothesis significance testing: a review of an old and continuing controversy. , 2000, Psychological methods.

[19]  N. Cliff Dominance statistics: Ordinal analyses to answer ordinal questions. , 1993 .

[20]  Helena Chmura Kraemer,et al.  Reconsidering the odds ratio as a measure of 2×2 association in a population , 2004, Statistics in medicine.

[21]  Leland Wilkinson,et al.  Statistical Methods in Psychology Journals Guidelines and Explanations , 2005 .

[22]  Jacob Cohen The Cost of Dichotomization , 1983 .

[23]  L. Hedges,et al.  The Handbook of Research Synthesis , 1995 .

[24]  D G Altman,et al.  Calculating the number needed to treat for trials where the outcome is time to an event , 1999, BMJ.

[25]  T. Kottke,et al.  Number needed to treat: caveat emptor. , 2001, Journal of clinical epidemiology.

[26]  Michael Borenstein,et al.  3.14 – The Shift from Significance Testing to Effect Size Estimation , 1998 .

[27]  Jerome Cornfield,et al.  A Statistical Problem Arising from Retrospective Studies , 1956 .

[28]  R. Serlin,et al.  Misuse of statistical test in three decades of psychotherapy research. , 1994, Journal of consulting and clinical psychology.

[29]  H. Kraemer Reporting the size of effects in research studies to facilitate assessment of practical or clinical significance , 1992, Psychoneuroendocrinology.

[30]  H. Veiel Base-rates, cut-points and interaction effects: the problem with dichotomized continuous variables , 1988, Psychological Medicine.

[31]  J. Cornfield,et al.  A method of estimating comparative rates from clinical data; applications to cancer of the lung, breast, and cervix. , 1951, Journal of the National Cancer Institute.

[32]  Helena Chmura Kraemer,et al.  Categorical versus dimensional approaches to diagnosis: methodological challenges. , 2004, Journal of psychiatric research.

[33]  M Borenstein,et al.  The case for confidence intervals in controlled clinical trials. , 1994, Controlled clinical trials.

[34]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[35]  H. Kraemer,et al.  Mediators and moderators of treatment effects in randomized clinical trials. , 2002, Archives of general psychiatry.

[36]  H. Kraemer A simple effect size indicator for two-group comparisons? A comment on r equivalent. , 2005, Psychological methods.

[37]  J. Tukey Tightening the clinical trial. , 1993, Controlled clinical trials.

[38]  N. Jacobson,et al.  Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. , 1991, Journal of consulting and clinical psychology.

[39]  D. Rubin,et al.  r equivalent: A simple effect size indicator. , 2003, Psychological methods.

[40]  M. Borenstein Hypothesis testing and effect size estimation in clinical trials. , 1997, Annals of allergy, asthma & immunology : official publication of the American College of Allergy, Asthma, & Immunology.

[41]  L. Hsu Biases of success rate differences shown in binomial effect size displays. , 2004, Psychological methods.

[42]  Bruce Thompson,et al.  Journal Editorial Policies Regarding Statistical Significance Tests: Heat Is to Fire as p Is to Importance , 1999 .

[43]  J. Hunter Needed: A Ban on the Significance Test , 1997 .

[44]  John J. Peterson,et al.  Probabilistic index: an intuitive non‐parametric approach to measuring the size of treatment effects , 2006, Statistics in medicine.

[45]  Kristopher J Preacher,et al.  On the practice of dichotomization of quantitative variables. , 2002, Psychological methods.