Two‐stage methods for the analysis of pooled data

Epidemiologic studies of disease often produce inconclusive or contradictory results due to small sample sizes or regional variations in the disease incidence or the exposures. To clarify these issues, researchers occasionally pool and reanalyse original data from several large studies. In this paper we explore the use of a two-stage random-effects model for analysing pooled case-control studies and undertake a thorough examination of bias in the pooled estimator under various conditions. The two-stage model analyses each study using the model appropriate to the design with study-specific confounders, and combines the individual study-specific adjusted log-odds ratios using a linear mixed-effects model; it is computationally simple and can incorporate study-level covariates and random effects. Simulations indicate that when the individual studies are large, two-stage methods produce nearly unbiased exposure estimates and standard errors of the exposure estimates from a generalized linear mixed model. By contrast, joint fixed-effects logistic regression produces attenuated exposure estimates and underestimates the standard error when heterogeneity is present. While bias in the pooled regression coefficient increases with interstudy heterogeneity for both models, it is much smaller using the two-stage model. In pooled analyses, where covariates may not be uniformly defined and coded across studies, and occasionally not measured in all studies, a joint model is often not feasible. The two-stage method is shown to be a simple, valid and practical method for the analysis of pooled binary data. The results are applied to a study of reproductive history and cutaneous melanoma risk in women using data from ten large case-control studies.

[1]  C. Paul,et al.  Depot medroxyprogesterone acetate and breast cancer. A pooled analysis of the World Health Organization and New Zealand studies. , 1995, JAMA.

[2]  P. Albert,et al.  Models for longitudinal data: a generalized estimating equation approach. , 1988, Biometrics.

[3]  J. Berlin,et al.  Invited commentary: benefits of heterogeneity in meta-analysis of data from epidemiologic studies. , 1995, American journal of epidemiology.

[4]  M. Pritz,et al.  Methods for combining rates from several studies. , 1999, Statistics in medicine.

[5]  V. T. Farewell,et al.  Some results on the estimation of logistic models based on retrospective data , 1979 .

[6]  M. Pike,et al.  Bias and efficiency in logistic analyses of stratified case-control studies. , 1980, International journal of epidemiology.

[7]  A. Whittemore,et al.  Dietary intake of fiber and decreased risk of cancers of the colon and rectum: evidence from the combined analysis of 13 case-control studies. , 1992, Journal of the National Cancer Institute.

[8]  K. Dickersin,et al.  Meta-analysis: state-of-the-science. , 1992, Epidemiologic reviews.

[9]  F. Speizer,et al.  Alcohol and breast cancer in women: a pooled analysis of cohort studies. , 1998, JAMA.

[10]  N Breslow,et al.  Regression analysis of the log odds ratio: a method for retrospective studies. , 1976, Biometrics.

[11]  E. White,et al.  Case-control study of malignant melanoma in Washington State. II. Diet, alcohol, and obesity. , 1994, American journal of epidemiology.

[12]  J. Ware,et al.  Random-effects models for serial observations with binary response. , 1984, Biometrics.

[13]  Eugene Demidenko,et al.  Asymptotic Properties of Nonlinear Mixed-Effects Models , 1997 .

[14]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[15]  J. Spinelli,et al.  Rethinking recycling. , 1995, Environmental health perspectives.

[16]  I Olkin,et al.  Meta-analysis: current issues in research synthesis. , 1996, Statistics in medicine.

[17]  W. G. Cochran The combination of estimates from different experiments. , 1954 .

[18]  J Halpern,et al.  Characteristics relating to ovarian cancer risk: collaborative analysis of 12 US case-control studies. I. Methods. Collaborative Ovarian Cancer Group. , 1992, American journal of epidemiology.

[19]  J. Kalbfleisch,et al.  A Comparison of Cluster-Specific and Population-Averaged Approaches for Analyzing Correlated Binary Data , 1991 .

[20]  S. Rosso,et al.  Cutaneous malignant melanoma in females: the role of hormonal and reproductive factors. , 1990, International journal of epidemiology.

[21]  G A Colditz,et al.  Heterogeneity in meta-analysis of data from epidemiologic studies: a commentary. , 1995, American journal of epidemiology.

[22]  P. Diggle,et al.  Analysis of Longitudinal Data , 2003 .

[23]  C. la Vecchia,et al.  Risk factors for breast cancer: pooled results from three Italian case-control studies. , 1988, American journal of epidemiology.

[24]  E Demidenko,et al.  Two-stage method of estimation for general linear growth curve models. , 1997, Biometrics.

[25]  I Olkin,et al.  Statistical and theoretical considerations in meta-analysis. , 1995, Journal of clinical epidemiology.

[26]  L. Stewart,et al.  Practical methodology of meta-analyses (overviews) using updated individual patient data. Cochrane Working Group. , 1995, Statistics in medicine.

[27]  G. Smith,et al.  Meta-analysis: Potentials and promise , 1997, BMJ.

[28]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[29]  A. Whittemore,et al.  Characteristics relating to ovarian cancer risk: collaborative analysis of 12 US case-control studies. II. Invasive epithelial ovarian cancers in white women. Collaborative Ovarian Cancer Group. , 1992, American journal of epidemiology.

[30]  M. Tucker,et al.  The Danish case‐control study of cutaneous malignant melanoma. III. Hormonal and reproductive factors in women , 1988, International journal of cancer.

[31]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[32]  Malcolm C. Pike,et al.  Algorithm AS 196: Conditional Multivariate Logistic Analysis of Stratified Case-Control Studies , 1984 .

[33]  J. Neuhaus Estimation efficiency and tests of covariate effects with clustered binary data. , 1993, Biometrics.

[34]  S J Pocock,et al.  Regression of area mortality rates on explanatory variables: what weighting is appropriate? , 1981, Journal of the Royal Statistical Society. Series C, Applied statistics.

[35]  N. Laird,et al.  Meta-analysis in clinical trials. , 1986, Controlled clinical trials.

[36]  J F Boisvieux,et al.  Alternative approaches to estimation of population pharmacokinetic parameters: comparison with the nonlinear mixed-effect model. , 1984, Drug metabolism reviews.

[37]  Christopher H. Schmid,et al.  Exploring Heterogeneity in Randomized Trials Via Meta-Analysis , 1999 .

[38]  N. Breslow,et al.  Algorithm AS 162: Multivariate Conditional Logistic Analysis of Stratum- Matched Case-Control Studies , 1981 .

[39]  N. Breslow,et al.  Statistical methods in cancer research: volume 1- The analysis of case-control studies , 1980 .

[40]  T C Chalmers,et al.  A comparison of statistical methods for combining event rates from clinical trials. , 1989, Statistics in medicine.

[41]  I Olkin,et al.  Diagnostic statistical procedures in medical meta-analyses. , 1999, Statistics in medicine.

[42]  D O Stram,et al.  Meta-analysis of published data using a linear mixed-effects model. , 1996, Biometrics.

[43]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[44]  J. Carlin Meta-analysis for 2 x 2 tables: a Bayesian approach. , 1992, Statistics in medicine.

[45]  E. Holly,et al.  Cutaneous melanoma in women. III. Reproductive factors and oral contraceptive use. , 1995, American journal of epidemiology.

[46]  J. Elwood,et al.  Malignant melanoma in England: risks associated with naevi, freckles, social class, hair colour, and sunburn. , 1990, International journal of epidemiology.

[47]  Benign melanocytic naevi as a risk factor for malignant melanoma. , 1986 .

[48]  N. Laird,et al.  Nonlinear growth curve analysis: estimating the population parameters. , 1986, Annals of human biology.

[49]  Comparison of Methods for General Nonlinear Mixed-Effects Models , 1997 .

[50]  C S Berkey,et al.  A random-effects regression model for meta-analysis. , 1995, Statistics in medicine.