A quantitative systematic review, or meta-analysis, uses statistical methods to combine the results of multiple studies. Meta-analyses have been done for systematic reviews of therapeutic trials, diagnostic test evaluations, and epidemiologic studies. Although the statistical methods involved may at first appear to be mathematically complex, their purpose is simple: They are trying to answer four basic questions. Are the results of the different studies similar? To the extent that they are similar, what is the best overall estimate? How precise and robust is this estimate? Finally, can dissimilarities be explained? This article provides some guidance in understanding the key technical aspects of the quantitative approach to these questions. We have avoided using equations and statistical notations; interested readers will find implementations of the described methods in the listed references. We focus here on the quantitative synthesis of reports of randomized, controlled, therapeutic trials because far more meta-analyses on therapeutic studies than on other types of studies have been published. For practical reasons, we present a stepwise description of the tasks that are performed when statistical methods are used to combine data. These tasks are 1) deciding whether to combine data and defining what to combine, 2) evaluating the statistical heterogeneity of the data, 3) estimating a common effect, 4) exploring and explaining heterogeneity, 5) assessing the potential for bias, and 6) presenting the results. Deciding Whether To Combine Data and Defining What To Combine By the time one performs a quantitative synthesis, certain decisions should already have been made about the formulation of the question and the selection of included studies. These topics were discussed in two previous articles in this series [1, 2]. Statistical tests cannot compensate for lack of common sense, clinical acumen, and biological plausibility in the design of the protocol of a meta-analysis. Thus, a reader of a systematic review should always address these issues before evaluating the statistical methods that have been used and the results that have been generated. Combining poor-quality data, overly biased data, or data that do not make sense can easily produce unreliable results. The data to be combined in a meta-analysis are usually either binary or continuous. Binary data involve a yes/no categorization (for example, death or survival). Continuous data take a range of values (for example, change in diastolic blood pressure after antihypertensive treatment, measured in mm Hg). When one is comparing groups of patients, binary data can be summarized by using several measures of treatment effect that were discussed earlier in this series [3]. These measures include the risk ratio; the odds ratio; the risk difference; and, when study duration is important, the incidence rate. Another useful clinical measure, the number needed to treat (NNT), is derived from the inverse of the risk difference [3]. Treatment effect measures, such as the risk ratio and the odds ratio, provide an estimate of the relative efficacy of an intervention, whereas the risk difference describes the intervention's absolute benefit. The various measures of treatment effect offer complementary information, and all should be examined [4]. Continuous data can be summarized by the raw mean difference between the treatment and control groups when the treatment effect is measured on the same scale (for example, diastolic blood pressure in mm Hg), by the standardized mean difference when different scales are used to measure the same treatment effect (for example, different pain scales being combined), or by the correlation coefficients between two continuous variables [5]. The standardized mean difference, also called the effect size, is obtained by dividing the difference between the mean in the treatment group and the mean in the control group by the SD in the control group. Evaluating the Statistical Heterogeneity of the Data This step is intended to answer the question, Are the results of the different studies similar (homogeneous)? It is important to answer this question before combining any data. To do this, one must calculate the magnitude of the statistical diversity (heterogeneity) of the treatment effect that exists among the different sets of data. Statistical diversity can be thought of as attributable to one or both of two causes. First, study results can differ because of random sampling error. Even if the true effect is the same in each study, the results of different studies would be expected to vary randomly around the true common fixed effect. This diversity is called the within-study variance. Second, each study may have been drawn from a different population, depending on the particular patients chosen and the interventions and conditions unique to the study. Therefore, even if each study enrolled a large patient sample, the treatment effect would be expected to differ. These differences, called random effects, describe the between-study variation with regard to an overall mean of the effects of all of the studies that could be undertaken. The test most commonly used to assess the statistical significance of between-study heterogeneity is based on the chi-square distribution [6]. It provides a measure of the sum of the squared differences between the results observed and the results expected in each study, under the assumption that each study estimates the same common treatment effect. A large total deviation indicates that a single common treatment effect is unlikely. Any pooled estimate calculated must account for the between-study heterogeneity. In practice, this test has low sensitivity for detecting heterogeneity, and it has been suggested that a liberal significance level, such as 0.1, should be used [6]. Estimating a Common Effect The questions that this step tries to answers are, 1) To the extent that data are similar, what is their best common point estimate of a therapeutic effect, and 2) how precise is this estimate? The mathematical process involved in this step generally involves combining (pooling) the results of different studies into an overall estimate. Compared with the results of individual studies, pooled results can increase statistical power and lead to more precise estimates of treatment effect. Each study is given a weight according to the precision of its results. The rationale is that studies with narrow CIs should be weighted more heavily than studies with greater uncertainty. The precision is generally expressed by the inverse of the variance of the estimate of each study. The variance has two components: the variance of the individual study and the variance between different studies. When the between-study variance is found to be or assumed to be zero, each study is simply weighted by the inverse of its own variance, which is a function of the study size and the number of events in the study. This approach characterizes a fixed-effects model, as exemplified by the Mantel-Haenszel method [7, 8] or the Peto method [9] for dichotomous data. The Peto method has been particularly popular in the past. It has the advantage of simple calculation; however, although it is appropriate in most cases, it may introduce large biases if the data are unbalanced [10, 11]. On the other hand, random-effects models also add the between-study variance to the within-study variance of each individual study when the pooled mean of the random effects is calculated. The random-effects model most commonly used for dichotomous data is the DerSimonian and Laird estimate of the between-study variance [12]. Fixed- and random-effects models for continuous data have also been described [13]. Pooled results are generally reported as a point estimate and CI, typically a 95% CI. Other quantitative techniques for combining data, such as the Confidence Profile Method [14], use Bayesian methods to calculate posterior probability distributions for effects of interest. Bayesian statistics are based on the principle that each observation or set of observations should be viewed in conjunction with a prior probability describing the prior knowledge about the phenomenon of interest [15]. The new observations alter this prior probability to generate a posterior probability. Traditional meta-analysis assumes that nothing is known about the magnitude of the treatment effect before randomized trials are performed. In Bayesian terms, the prior probability distribution is noninformative. Bayesian approaches may also allow the incorporation of indirect evidence in generating prior distributions [14] and may be particularly helpful in situations in which few data from randomized studies exist [16]. Bayesian analyses may also be used to account for the uncertainty introduced by estimating the between-study variance in the random-effects model, leading to more appropriate estimates and predictions of treatment efficacy [17]. Exploring and Explaining Heterogeneity The next important issue is whether the common estimate obtained in the previous step is robust. Sensitivity analyses determine whether the common estimate is influenced by changes in the assumptions and in the protocol for combining the data. A comparison of the results of fixed- and random-effects models is one such sensitivity analysis [18]. Generally, the random-effects model produces wider CIs than does the fixed-effects model, and the level of statistical significance may therefore be different depending on the model used. The pooled point estimate per se is less likely to be affected, although exceptions are possible [19]. Other sensitivity analyses may include the examination of the residuals and the chi-square components [13] and assessment of the effect of deleting each study in turn. Statistically significant results that depend on a single study may require further exploration. Cumulative Meta-Analysis Cu
[1]
R. Moore,et al.
Using Numerical Results from Systematic Reviews in Clinical Practice
,
1997,
Annals of Internal Medicine.
[2]
S B Thacker,et al.
On combining dose-response data from epidemiological studies by meta-analysis.
,
1995,
Statistics in medicine.
[3]
R. Tweedie,et al.
Meta-analytic approaches to dose-response relationships, with application in studies of lung cancer and exposure to environmental tobacco smoke.
,
1995,
Statistics in medicine.
[4]
N. Laird,et al.
Meta-analysis in clinical trials.
,
1986,
Controlled clinical trials.
[5]
K. Dickersin.
The existence of publication bias and risk factors for its occurrence.
,
1990,
JAMA.
[6]
W. Richardson,et al.
Selecting and Appraising Studies for a Systematic Review
,
1997,
Annals of Internal Medicine.
[7]
C D Naylor,et al.
Incorporating variations in the quality of individual randomized trials into meta-analysis.
,
1992,
Journal of clinical epidemiology.
[8]
G H Guyatt,et al.
A Consumer's Guide to Subgroup Analyses
,
1992,
Annals of Internal Medicine.
[9]
M. McIntosh,et al.
The population risk as an explanatory variable in research synthesis of clinical trials.
,
1996,
Statistics in medicine.
[10]
J. Robins,et al.
Invited commentary: ecologic studies--biases, misconceptions, and counterexamples.
,
1994,
American journal of epidemiology.
[11]
T C Chalmers,et al.
A method for assessing the quality of a randomized control trial.
,
1981,
Controlled clinical trials.
[12]
D. Cook,et al.
Stress ulcer prophylaxis in the critically ill: a meta-analysis.
,
1991,
The American journal of medicine.
[13]
J. Villar,et al.
Predictive ability of meta-analyses of randomised controlled trials
,
1995,
The Lancet.
[14]
P. Easterbrook,et al.
Publication bias in clinical research
,
1991,
The Lancet.
[15]
B. Rosner,et al.
Data trawling: to fish or not to fish
,
1996,
The Lancet.
[16]
R. J. Hayes,et al.
Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials.
,
1995,
JAMA.
[17]
STRESS ULCER PROPHYLAXIS IN CRITICALLY ILL PATIENTS
,
1989,
The Lancet.
[18]
J. Ioannidis,et al.
On meta-analyses of meta-analyses
,
1996,
The Lancet.
[19]
S Greenland,et al.
Bias in the one-step method for pooling study results.
,
1990,
Statistics in medicine.
[20]
C H Schmid,et al.
Cumulative meta-analysis of clinical trials builds evidence for exemplary medical care.
,
1995,
Journal of clinical epidemiology.
[21]
David M. Eddy,et al.
Meta-analysis by the confidence profile method
,
1992
.
[22]
G. Guyatt,et al.
Stress ulcer prophylaxis in critically ill patients. Resolving discordant meta-analyses.
,
1996,
JAMA.
[23]
David B. Pillemer,et al.
Summing Up: The Science of Reviewing Research
,
1984
.
[24]
R. Peto,et al.
Beta blockade during and after myocardial infarction: an overview of the randomized trials.
,
1985,
Progress in cardiovascular diseases.
[25]
David B. Dunson,et al.
Bayesian Data Analysis
,
2010
.
[26]
Jack L. Vevea,et al.
A general linear model for estimating effect size in the presence of publication bias
,
1995
.
[27]
L. Hedges,et al.
The Handbook of Research Synthesis
,
1995
.
[28]
S. Greenland,et al.
Methods for trend estimation from summarized dose-response data, with applications to meta-analysis.
,
1992,
American journal of epidemiology.
[29]
K. Dickersin,et al.
Publication bias and clinical trials.
,
1987,
Controlled clinical trials.
[30]
H. Sacks,et al.
Early or Deferred Zidovudine Therapy in HIV-Infected Patients without an AIDS-Defining Illness
,
1995,
Annals of Internal Medicine.
[31]
Colin B. Begg,et al.
An Approach for Assessing Publication Bias Prior to Performing a Meta-Analysis
,
1992
.
[32]
C H Schmid,et al.
Large trials vs meta-analysis of smaller trials : How do their results compare ?
,
1996
.
[33]
P. Sleight,et al.
Publication bias
,
1991,
The Lancet.
[34]
Allan J. Lichtman,et al.
Ecological Inference
,
1978
.
[35]
Douglas G Altman,et al.
Better reporting of randomised controlled trials: the CONSORT statement
,
1996,
BMJ.
[36]
M. Tryba.
Prophylaxis of Stress Ulcer Bleeding: A Meta‐Analysis
,
1991,
Journal of clinical gastroenterology.
[37]
L. Hedges,et al.
Statistical Methods for Meta-Analysis
,
1987
.
[38]
C. Begg,et al.
Operating characteristics of a rank correlation test for publication bias.
,
1994,
Biometrics.
[39]
C. Counsell,et al.
Formulating Questions and Locating Primary Studies for Inclusion in Systematic Reviews
,
1997,
Annals of Internal Medicine.
[40]
T C Chalmers,et al.
A comparison of statistical methods for combining event rates from clinical trials.
,
1989,
Statistics in medicine.
[41]
W. Haenszel,et al.
Statistical aspects of the analysis of data from retrospective studies of disease.
,
1959,
Journal of the National Cancer Institute.
[42]
L. Stewart,et al.
Practical methodology of meta-analyses (overviews) using updated individual patient data. Cochrane Working Group.
,
1995,
Statistics in medicine.
[43]
I Olkin,et al.
Statistical and theoretical considerations in meta-analysis.
,
1995,
Journal of clinical epidemiology.
[44]
W. DuMouchel,et al.
Meta-analysis for dose-response models.
,
1995,
Statistics in medicine.
[45]
M. Bracken,et al.
Clinically useful measures of effect in binary analyses of randomized trials.
,
1994,
Journal of clinical epidemiology.
[46]
L E Moses,et al.
Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations.
,
1993,
Statistics in medicine.
[47]
E. Antman,et al.
Advantages and limitations of metaanalytic regressions of clinical trials data.
,
1992,
The Online journal of current clinical trials.
[48]
T C Chalmers,et al.
Cumulative meta-analysis of therapeutic trials for myocardial infarction.
,
1992,
The New England journal of medicine.
[49]
S Greenland,et al.
Invited commentary: a critical look at some popular meta-analytic methods.
,
1994,
American journal of epidemiology.
[50]
I. Olkin,et al.
Improving the quality of reporting of randomized controlled trials. The CONSORT statement.
,
1996,
JAMA.
[51]
D J Spiegelhalter,et al.
Bayesian approaches to random-effects meta-analysis: a comparative study.
,
1995,
Statistics in medicine.
[52]
J D Emerson,et al.
An empirical study of the possible relation of treatment differences to quality scores in controlled randomized clinical trials.
,
1990,
Controlled clinical trials.
[53]
Frederick Mosteller,et al.
Guidelines for Meta-analyses Evaluating Diagnostic Tests
,
1994,
Annals of Internal Medicine.
[54]
J. Fleiss,et al.
The statistical basis of meta-analysis.
,
1993,
Statistical methods in medical research.
[55]
J G Thornton,et al.
Clinical trials and rare diseases: a way out of a conundrum
,
1995,
BMJ.
[56]
L. Hedges.
Modeling publication selection effects in meta-analysis
,
1992
.
[57]
H. Morgenstern.
Uses of ecologic analysis in epidemiologic research.
,
1982,
American journal of public health.
[58]
F Mosteller,et al.
Some Statistical Methods for Combining Experimental Results
,
1990,
International Journal of Technology Assessment in Health Care.
[59]
J. Fleiss,et al.
Statistical methods for rates and proportions
,
1973
.
[60]
A R Jadad,et al.
Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists.
,
1995,
Controlled clinical trials.
[61]
P. Ridker,et al.
Discordance between Meta-analyses and Large-Scale Randomized, Controlled Trials: Examples from the Management of Acute Myocardial Infarction
,
1995,
Annals of Internal Medicine.
[62]
I. Holme.
Relation of coronary heart disease incidence and total mortality to plasma cholesterol reduction in randomised trials: use of meta-analysis.
,
1993,
British heart journal.