Estimation Based on Case-Control Designs with Known Prevalence Probability

Regular case-control sampling is an extremely common design used to generate data to estimate effects of exposures or treatments on a binary outcome of interest when the proportion of cases (i.e., binary outcome equal to 1) in the population of interest is low. Case-control sampling represents a biased sample of a target population of interest by sampling a disproportional number of cases. Case-control studies are also commonly employed to estimate the effects of genetic markers or biomarkers on binary phenotypes.In this article we present a general method of estimation relying on knowing the prevalence probability, conditional on the matching variable if matching is used.Our general proposed methodology, involving a simple weighting scheme of cases and controls, maps any estimation method for a parameter developed for prospective sampling from the population of interest into an estimation method based on case-control sampling from this population.We show that this case-control weighting of an efficient estimator for a prospective sample from the target population of interest maps into an efficient estimator for matched and unmatched case-control sampling. In particular, we show how application of this generic methodology provides us with double robust locally efficient targeted maximum likelihood estimators of the causal relative risk and causal odds ratio for regular case control sampling and matched case control sampling.Various extensions and generalizations of our methods are discussed.

[1]  S Greenland,et al.  Multivariate estimation of exposure-specific incidence from case-control studies. , 1981, Journal of chronic diseases.

[2]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[3]  P. Rosenbaum Choice as an Alternative to Control in Observational Studies , 1999 .

[4]  M. J. Laan Causal Effect Models for Intention to Treat and Realistic Individualized Treatment Rules , 2006 .

[5]  N. Breslow,et al.  Statistics in Epidemiology : The Case-Control Study , 2008 .

[6]  S Wacholder,et al.  The Case‐Control Study as Data Missing by Design: Estimating Risk Differences , 1996, Epidemiology.

[7]  N. Breslow,et al.  Estimation of multiple relative risk functions in matched case-control studies. , 1978, American journal of epidemiology.

[8]  N. Jewell Statistics for Epidemiology , 2003 .

[9]  R. L. Prentice,et al.  Retrospective studies and failure time models , 1978 .

[10]  J. Robins,et al.  On the semi-parametric efficiency of logistic regression under case-control sampling , 2000 .

[11]  Sander Greenland,et al.  Model-based estimation of relative risks and other epidemiologic measures in studies of common outcomes and in case-control studies. , 2004, American journal of epidemiology.

[12]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[13]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[14]  J. Kelsey,et al.  Multivariate analysis for matched case-control studies. , 1978, American journal of epidemiology.

[15]  S. Cosslett,et al.  Maximum likelihood estimator for choice-based samples , 1981 .

[16]  Charles F. Manski,et al.  Alternative Estimators and Sample Designs for Discrete Choice Analysis , 1981 .

[17]  J. Benichou,et al.  A comparison of three approaches to estimate exposure-specific incidence rates from population-based case-control data. , 1994, Statistics in medicine.

[18]  D. Rubin Matched Sampling for Causal Effects , 2006 .

[19]  M. Graffar [Modern epidemiology]. , 1971, Bruxelles medical.

[20]  M Bobbio,et al.  The Effect of Disease-prevalence Adjustments on the Accuracy of a Logistic Prediction Model , 1996, Medical decision making : an international journal of the Society for Medical Decision Making.

[21]  James J Schlesselman Case-Control Studies: Design, Conduct, Analysis , 1982 .

[22]  N. Breslow,et al.  The analysis of case-control studies , 1980 .

[23]  D. Collett,et al.  Modeling Binary Data. , 1993 .

[24]  Steven R. Lerman,et al.  The Estimation of Choice Probabilities from Choice Based Samples , 1977 .

[25]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[26]  Stephen C Newman Causal analysis of case-control data , 2006, Epidemiologic perspectives & innovations : EP+I.

[27]  Jerome Cornfield,et al.  A Statistical Problem Arising from Retrospective Studies , 1956 .

[28]  M J van der Laan,et al.  Covariate adjustment in randomized trials with binary outcomes: Targeted maximum likelihood estimation , 2009, Statistics in medicine.

[29]  Marshall M Joffe,et al.  On the estimation and use of propensity scores in case-control and case-cohort studies. , 2007, American journal of epidemiology.

[30]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[31]  M. J. van der Laan,et al.  The International Journal of Biostatistics Targeted Maximum Likelihood Learning , 2011 .

[32]  J. Cornfield,et al.  A method of estimating comparative rates from clinical data; applications to cancer of the lung, breast, and cervix. , 1951, Journal of the National Cancer Institute.

[33]  J. Anderson Separate sample logistic discrimination , 1972 .