As professionals, we want to use the best treatments; as patients, we want to be given them. Knowing whether an intervention works (or does not work) is fundamental to clinical decision making. However, clinical decision making involves more than simply taking published results of research directly to the bedside. Physicians need to consider how similar their patients are to those in the published studies, to take the values and preferences of their patients into account, and to consider their own experience with a given test or treatment. Evidence from clinical research is becoming increasingly important in medical-practice decisions as more and better evidence is published. But when is the evidence strong enough to justify changing a practice? Individual studies that involve only small numbers of patients may have results that are distorted by the random play of chance and thus lead to less than optimal decisions. As is clear from other papers in this series, systematic reviews identify, critically appraise, and review all the relevant studies on a clinical question and are more likely to give a valid answer. They use explicit methods and quality standards to reduce bias. Their results are the closest we can come to reaching the truth given our current state of knowledge. The questions about an intervention that a systematic review should answer are the following: 1. Does it work? 2. If it works, how well does it work in general and compared with placebo, no treatment, or other interventions that are currently in use? 3. Is it safe? 4. Will it be safe and effective for my patients? Whereas the critical appraisal and qualitative synthesis provided by review articles can be interpreted directly, the numerical products of quantitative reviews can be more difficult to understand and apply in daily clinical practice. This paper provides guidance on how to interpret the numerical and statistical results of systematic reviews, translate these results into more understandable terms, and apply them directly to individual patients. Many of these principles can also be used to interpret the numerical results of individual clinical studies. They are particularly relevant to systematic reviews, however, because such reviews contain more information than do primary studies and often exert greater influence than do individual studies. Making Sense of the Numerical Results of Clinical Studies Although the results of clinical studies can be expressed in intuitively meaningful ways, such results do not always easily translate into clinical decision making. For example, results are frequently expressed in terms of risk, which is an expression of the frequency of a given outcome. (Risks are probabilities, which can vary between 0.0 and 1.0. A probability of 0.0 means that the event will never happen, and a probability of 1.0 means that it always happens.) Consider a hypothetical study of the recurrence of migraine headaches in a control group receiving placebo and a treatment group receiving a new antimigraine preparation, drug M (a secondary prevention trial). Suppose that at the end of the trial, migraines recurred in 30% of the control group (the risk for recurrence was 0.30) but in only 5% of the drug M group (risk of 0.05) (Table 1). Table 1. Numerical Expression of Hypothetical Clinical Trial Results The outcomes of the study are clear enough for the two groups when they are examined separately. But clinicians and patients are more interested in the comparative results, that is, the outcome in one group relative to the outcome in the other group. This overall (comparative) result can be expressed in various ways. For example, the relative risk, which is the risk in the treatment group relative to that in the control group, is simply the ratio of the risks in the two groups. In other words, relative risk is the risk in the treatment group divided by that in the control group, 0.05 0.30, or 0.17. The comparison can also be expressed as the reduction in relative risk, which is the ratio between the decrease in risk (in the treatment group) and the risk in the control group, 0.25 0.30, or 0.83 (Table 1). (The relative risk reduction can also be calculated as 1 relative risk). Although the clinical meaning of relative risk (and relative risk reduction) is reasonably clear, relative risk has the distinct disadvantage that a given value (for example, 0.17) is the same whether the risk with treatment decreases from 0.80 to 0.14, from 0.30 to 0.05, from 0.001 to 0.00017, and so forth. The clinical implications of these changes clearly differ from one another enormously and depend on the specific disease and intervention. An important alternate expression of comparative results, therefore, is the absolute risk reduction. Absolute risk reduction is determined by subtracting the risk in one group from the risk in the other (for example, the risk in the treatment group is subtracted from the risk in the placebo group). In the case of our migraine study, the absolute risk reduction would be 0.30 0.05, which equals 0.25, or 25 percentage points. In contrast, for a study in which the risk decreased from 0.001 to 0.00017, the absolute risk reduction would be only 0.00083, or 0.083 percentage points, which is a trivial change in comparison (Table 1). This arithmetic emphasizes the difficulty of expressing the results of clinical studies in meaningful ways. Relative risk and relative risk reduction clearly give a quantitative sense of the effects of an intervention in proportional terms but provide no clue about the size of an effect on an absolute scale. In contrast, although it tells less about proportional effects, absolute risk says a great deal about whether an effect is likely to be clinically meaningful. Despite this benefit, even absolute risk is problematic because it is a dimensionless, abstract number; that is, it lacks a direct connection with the clinical situation in which the patient and physician exist. However, another way of expressing clinical research results can provide that clinical link: the number needed to treat (NNT). Number Needed To Treat The NNT for a given therapy is simply the reciprocal of the absolute risk reduction for that treatment [1, 2]. In the case of our hypothetical migraine study (in which risk decreased from 0.30 without treatment with drug M to 0.05 with treatment with drug M, for a relative risk of 0.17, a relative risk reduction of 0.83, and an absolute risk reduction of 0.25), the NNT would be 1 0.25, or 4. In concrete clinical terms, an NNT of 4 means that you would need to treat four patients with drug M to prevent migraine from recurring in one patient. To emphasize the difference between the concepts embodied in NNT and relative risk, recall the various situations mentioned above, in all of which the relative risk was 0.17 but in which the absolute risk decreased from 0.80 to 0.14 in one case and from 0.001 to 0.00017 in another. Note that the corresponding NNTs in these two other cases are 1.5 and 1204, respectively: that is, you would need to treat 1.5 and 1204 patients to obtain a therapeutic result in these two situations compared with 4 patients with drug M (Table 1). The NNT can be calculated easily and kept as a single numerical reminder of the effectiveness (or, as we will see, the potential for harm) of a particular therapy. As we suggested, the NNT has the crucial advantage of direct applicability to clinical practice because it shows the effort that is required to achieve a particular therapeutic target. The NNT has the additional advantage that it can be applied to any beneficial outcome or any adverse event (when it becomes the number needed to harm [NNH]). The concept of NNT always refers to a comparison group (in which patients receive placebo, no treatment, or some other treatment), a particular treatment outcome, and a defined period of treatment. In other words, the NNT is the number of patients that you will need to treat with drug or treatment A to achieve an improvement in outcome compared with drug or treatment B for a treatment period of C weeks (or other unit of time). To be fully specified, NNT and NNH must always specify the comparator, the therapeutic outcome, and the duration of treatment that is necessary to achieve the outcome. Important Qualities of the Number Needed To Treat The NNT is treatment specific. It describes the difference between treatment and control in achieving a particular clinical outcome. Table 2 shows NNTs from a selection of systematic reviews and large randomized, controlled trials. Table 2. Numbers Needed To Treat from Systematic Reviews and Randomized, Controlled Trials A very small NNT (that is, one that approaches 1) means that a favorable outcome occurs in nearly every patient who receives the treatment and in few patients in a comparison group. Although NNTs close to 1 are theoretically possible, they are almost never found in practice. However, small NNTs do occur in some therapeutic trials, such as those comparing antibiotics with placebo in the eradication of Helicobacter pylori infection or those examining the use of insecticide for head lice (Table 2). An NNT of 2 or 3 indicates that a treatment is quite effective. In contrast, such prophylactic interventions as adding aspirin to streptokinase to reduce 5-week vascular mortality rates after myocardial infarction may have NNTs as high as 20 to 40 and still be considered clinically effective. Limitations of the Number Needed To Treat Although NNTs are powerful instruments for interpreting clinical effects, they also have important limitations. First, an NNT is generally expressed as a single number, which is known as its point estimate. As with all experimental measurements, however, the true value of the NNT can be higher or lower than the point estimate determined through clinical studies. The 95% CIs of the NNT are useful in this regard because they provide an indication that,
[1]
D. Carroll,et al.
Paracetamol with and without codeine in acute pain: a quantitative systematic review
,
1997,
Pain.
[2]
H. Hricak,et al.
Evidence-based medicine.
,
1997,
Singapore medical journal.
[3]
H. J. McQuay,et al.
A systematic review of antidepressants in neuropathic pain
,
1996,
Pain.
[4]
Jonathan J. Deeks,et al.
Down with odds ratios!
,
1996,
Evidence Based Medicine.
[5]
Patrice Degoulet,et al.
The number needed to treat: a clinically useful nomogram in its proper context
,
1996,
BMJ.
[6]
M. Tramèr,et al.
Prevention of vomiting after paediatric strabismus surgery: a systematic review using the numbers-needed-to-treat method.
,
1995,
British journal of anaesthesia.
[7]
T. Fahey,et al.
Evidence based purchasing: understanding results of clinical trials and systematic reviews
,
1995,
BMJ.
[8]
Dawn Carroll,et al.
Anticonvulsant drugs for management of pain: a systematic review
,
1995,
BMJ.
[9]
J. Schoenen,et al.
The effectiveness of combined oral lysine acetylsalicylate and metoclopramide compared with oral sumatriptan for migraine
,
1995,
The Lancet.
[10]
D. Hogan.
Short-duration treatment of fingernail dermatophytosis: a randomized, double-blind study with terbinafine and griseofulvin.
,
1995,
Journal of the American Academy of Dermatology.
[11]
J. Senior,et al.
Misoprostol Reduces Serious Gastrointestinal Complications in Patients with Rheumatoid Arthritis Receiving Nonsteroidal Anti-Inflammatory Drugs
,
1995,
Annals of Internal Medicine.
[12]
D. Cook,et al.
Endoscopic Ligation Compared with Sclerotherapy for Treatment of Esophageal Variceal Bleeding
,
1995,
Annals of Internal Medicine.
[13]
P. Crowley.
Antenatal corticosteroid therapy: a meta-analysis of the randomized trials, 1972 to 1994.
,
1995,
American journal of obstetrics and gynecology.
[14]
P. Hazell,et al.
Efficacy of tricyclic drugs in treating child and adolescent depression: a meta-analysis
,
1995,
BMJ.
[15]
D. Sackett,et al.
The number needed to treat: a clinically useful measure of treatment effect
,
1995,
BMJ.
[16]
R. V. Stichele,et al.
Systematic review of clinical efficacy of topical treatments for head lice. authors' reply
,
1995
.
[17]
M. Bräutigam,et al.
Short-duration treatment of fingernail dermatophytosis: a randomized, double-blind study with terbinafine and griseofulvin. LAGOS III Study Group.
,
1995,
Journal of the American Academy of Dermatology.
[18]
T. Lancaster,et al.
Primary care management of acute herpes zoster: systematic review of evidence from randomized controlled trials.
,
1995,
The British journal of general practice : the journal of the Royal College of General Practitioners.
[19]
C. Mulrow,et al.
Hypertension in the elderly. Implications and generalizability of randomized trials.
,
1994,
JAMA.
[20]
M. Bracken,et al.
Clinically useful measures of effect in binary analyses of randomized trials.
,
1994,
Journal of clinical epidemiology.
[21]
P. Lehert,et al.
Naftidrofuryl in intermittent claudication: a retrospective analysis.
,
1994,
Journal of cardiovascular pharmacology.
[22]
E. Keeler,et al.
Effect of epidural analgesia for labor on the cesarean delivery rate
,
1994,
Obstetrics and gynecology.
[23]
B. Demichelis,et al.
Completeness of reporting trial results: effect on physicians' willingness to prescribe
,
1994,
The Lancet.
[24]
P Cummings,et al.
Antibiotics to prevent infection in patients with dog bite wounds: a meta-analysis of randomized trials.
,
1994,
Annals of emergency medicine.
[25]
R. Rosenfeld,et al.
Clinical efficacy of antimicrobial drugs for acute otitis media: metaanalysis of 5400 children from thirty-three randomized trials.
,
1994,
The Journal of pediatrics.
[26]
J. Hirsh,et al.
Graduated compression stockings in the prevention of postoperative venous thromboembolism. A meta-analysis.
,
1994,
Archives of internal medicine.
[27]
P. Tfelt-Hansen,et al.
Sumatriptan for the Treatment of Migraine Attacks-A Review of Controlled Clinical Trials
,
1993,
Cephalalgia : an international journal of headache.
[28]
C D Naylor,et al.
Measured Enthusiasm: Does the Method of Reporting Trial Results Alter Perceptions of Therapeutic Effectiveness?
,
1992,
Annals of Internal Medicine.
[29]
Patrick Onghena,et al.
Antidepressant-induced analgesia in chronic non-malignant pain: a meta-analysis of 39 placebo-controlled studies
,
1992,
Pain.
[30]
Peter C. G∅tzsche.
Sensitivity of effect variables in rheumatoid arthritis: A meta-analysis of 130 placebo controlled NSAID trials controlled NSAID trials
,
1990
.
[31]
P. Gøtzsche.
Sensitivity of effect variables in rheumatoid arthritis: a meta-analysis of 130 placebo controlled NSAID trials.
,
1990,
Journal of clinical epidemiology.
[32]
Sarah Parish,et al.
Randomized trial of intravenous streptokinase, oral aspirin, both, or neither among 17,187 cases of suspected acute myocardial infarction: ISIS-2.ISIS-2 (Second International Study of Infarct Survival) Collaborative Group.
,
1988,
Journal of the American College of Cardiology.
[33]
D L Sackett,et al.
An assessment of clinically useful measures of the consequences of treatment.
,
1988,
The New England journal of medicine.
[34]
A. Stark,et al.
Respiratory distress syndrome.
,
1986,
Pediatric clinics of North America.
[35]
J. Gauthier,et al.
[Respiratory distress syndrome in an adult].
,
1973,
L'union medicale du Canada.