The Combination of Ecological and Case-Control Data.

Ecological studies, in which data are available at the level of the group, rather than at the level of the individual, are susceptible to a range of biases due to their inability to characterize within-group variability in exposures and confounders. In order to overcome these biases, we propose a hybrid design in which ecological data are supplemented with a sample of individual-level case-control data. We develop the likelihood for this design and illustrate its benefits via simulation, both in bias reduction when compared to an ecological study, and in efficiency gains relative to a conventional case-control study. An interesting special case of the proposed design is the situation where ecological data are supplemented with case-only data. The design is illustrated using a dataset of county-specific lung cancer mortality rates in the state of Ohio from 1988.

[1]  Nilanjan Chatterjee,et al.  Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies , 2005 .

[2]  G. W. Hill,et al.  Analysis of survey data , 1996 .

[3]  C Montomoli,et al.  Spatial correlation in ecological analysis. , 1993, International journal of epidemiology.

[4]  D Hémon,et al.  Comparison of relative risks obtained in ecological and individual studies: some methodological considerations. , 1987, International journal of epidemiology.

[5]  J Wakefield,et al.  Magnesium in drinking water supplies and mortality from acute myocardial infarction in north west England , 1999, Heart.

[6]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[7]  Steven R. Lerman,et al.  The Estimation of Choice Probabilities from Choice Based Samples , 1977 .

[8]  Ori Rosen,et al.  Fast and Stable Algorithms for Computing and Sampling From the Noncentral Hypergeometric Distribution , 2001 .

[9]  U. Strömberg,et al.  Incorporating Group-Level Exposure Information in Case-Control Studies With Missing Data on Dichotomous Exposures , 2004, Epidemiology.

[10]  Lianne Sheppard,et al.  Insights on bias and information in group-level studies. , 2003, Biostatistics.

[11]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[12]  Ross L. Prentice,et al.  Aggregate data studies of disease risk factors , 1995 .

[13]  B. Efron,et al.  Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information , 1978 .

[14]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[15]  S. Richardson,et al.  Ecological correlation studies , 2001 .

[16]  P. Diaconis,et al.  Algebraic algorithms for sampling from conditional distributions , 1998 .

[17]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[18]  Nilanjan Chatterjee,et al.  Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis , 1999 .

[19]  Robert Chambers,et al.  Maximum Likelihood Inference from Sample Survey Data , 1994 .

[20]  S. Greenland Ecologic versus individual-level sources of bias in ecologic estimates of contextual health effects. , 2001, International journal of epidemiology.

[21]  O. Zyryanova,et al.  Ecological Studies , 1911, Nature.

[22]  J. Forster Ecological inference for 2 × 2 tables - Discussion , 2004 .

[23]  R. Prentice,et al.  Dietary fat and cancer: consistency of the epidemiologic data, and disease prevention that may follow from a practical reduction in fat consumption , 1990, Cancer Causes & Control.

[24]  Sylvia Richardson,et al.  Improving ecological inference using individual‐level data , 2006, Statistics in medicine.

[25]  Jerald F. Lawless,et al.  Semiparametric methods for response‐selective and missing data problems in regression , 1999 .

[26]  K. Judge,et al.  Income inequality and population health. , 1998, Social science & medicine.

[27]  Eric J. Beh,et al.  The Information in Aggregate Data , 2004 .

[28]  J E White,et al.  A two stage design for the study of the relationship between a rare exposure and a rare disease. , 1982, American journal of epidemiology.

[29]  B. Cohen,et al.  Divergent biases in ecologic and individual level studies. , 1995, Statistics in medicine.

[30]  B P Carlin,et al.  Spatio-temporal models with errors in covariates: mapping Ohio lung cancer mortality. , 1998, Statistics in medicine.

[31]  Robert Chambers,et al.  Analysis of survey data , 2003 .

[32]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[33]  Sebastien Haneuse,et al.  The combination of ecological and case–control data , 2008 .

[34]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[35]  D. Freedman,et al.  A solution to the ecological inference problem , 1997 .

[36]  R. L. Plackett,et al.  The marginal totals of a 2×2 table , 1977 .

[37]  C Guihenneuc-Jouyaux,et al.  Biases in ecological studies: utility of including within-area distribution of confounders. , 2000, Statistics in medicine.

[38]  Norman E. Breslow,et al.  Maximum Likelihood Estimation of Logistic Regression Parameters under Two‐phase, Outcome‐dependent Sampling , 1997 .

[39]  Adrian Dobra,et al.  Assessing the Risk of Disclosure of Confidential Categorical Data , 2002 .

[40]  Jon Wakefield,et al.  Sensitivity Analyses for Ecological Regression , 2003, Biometrics.

[41]  Sander Greenland,et al.  A review of multilevel theory for ecologic analyses , 2002, Statistics in medicine.

[42]  A. V. D. Vaart Asymptotic Statistics: Delta Method , 1998 .

[43]  N. L. Johnson,et al.  Distributions in Statistics: Discrete Distributions. , 1970 .

[44]  J. Besag,et al.  Bayesian image restoration, with two applications in spatial statistics , 1991 .

[45]  S Greenland,et al.  Analytic methods for two-stage case-control studies and other stratified designs. , 1991, Statistics in medicine.

[46]  N E Breslow,et al.  Weighted likelihood, pseudo-likelihood and maximum likelihood methods for logistic regression analysis of two-stage data. , 1997, Statistics in medicine.