Case-control studies = odds ratios: blame the retrospective model.

Many epidemiologists and statisticians believe that the odds ratio is the only measure that can be reliably estimated from case-control studies. I have received many reviews of case-control study papers instructing us to change “rate ratio” to “odds ratio” when, in fact, it was the rate ratio that we had estimated. Further, in discussions about methods we have developed to estimate absolute risk measures from nested case-control studies, I have found that many are surprised to learn that absolute risk can be estimated systematically and reliably in the case-control setting. While case-control studies are best suited for estimation of relative measures, and I do not wish to minimize the challenges of estimation on other scales from case-control data generally, there are reliable methods for doing so. Reporting of absolute risk estimates seems desirable to supplement the usual case-control relative measure analyses, 1 but this is only rarely done even when it is feasible. So why is there an odds-ratio fixation? I believe the core problem is that epidemiologists often think of case-control studies as a “retrospective model.” According to this view, we start with a set of cases and controls. Then the covariates (exposures and other factors) occur as independent realizations with distribution dependent on disease status. This is in contrast to the “prospective model” of cohort data, in which we start with a group of subjects with given covariates, and disease status is the result of independent realizations with probability dependent on the covariate values. Students learn that the odds ratio parameters in the retrospective logistic model are same as the odds ratio parameters in the corresponding prospective logistic model, but that other measures do not translate. The matter is further confused because valid estimation of odds ratio parameters from retrospective model casecontrol data may be obtained using the corresponding prospective cohort data logistic regression, with the estimated “baseline odds” a nuisance parameter to be ignored. 2,3 But case-control studies are not backwards cohort (“trohoc”) studies. 4 A more realistic way to represent case-control designs is as sampling from a prospective cohort that depends on disease outcomes and other information available on cohort subjects— what we have called the nested case-control model. 5 While this representation is certainly not new, and is often used in epidemiology textbooks as a “conceptual framework” to think about basic sampling and bias issues, almost invariably the retrospective model approach is used to develop the analysis methods. The alternative approach my colleagues and I have taken is to develop case-control study methods based completely on the nested case-control model, and to provide a unifying framework across cohort and case-control analysis methods, as well as across individually matched and unmatched case-control study designs. While there are still some important gaps to be filled, we have made progress. After

[1]  Bryan Langholz,et al.  Methods for the Analysis of Sampled Cohort Data in the Cox Proportional Hazards Model , 1995 .

[2]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[3]  R. Carroll,et al.  Prospective Analysis of Logistic Case-Control Studies , 1995 .

[4]  Anders Skrondal,et al.  Stratified Case‐Cohort Analysis of General Cohort Sampling Designs , 2007 .

[5]  B Langholz,et al.  Conditional logistic analysis of case-control studies with complex sampling. , 2001, Biostatistics.

[6]  S. Shak,et al.  A population-based study of tumor gene expression and risk of breast cancer death among lymph node-negative patients , 2006, Breast Cancer Research.

[7]  Sven Ove Samuelsen,et al.  A psudolikelihood approach to analysis of nested case-control studies , 1997 .

[8]  B. Langholz,et al.  Risk set sampling designs for proportional hazard models , 1997 .

[9]  S Wacholder,et al.  The Case‐Control Study as Data Missing by Design: Estimating Risk Differences , 1996, Epidemiology.

[10]  Bryan Langholz,et al.  Use of Cohort Information in the Design and Analysis of Case‐Control Studies , 2007 .

[11]  Local central limit theorems, the high-order correlations of rejective sampling and logistic likelihood asymptotics , 2005, math/0506300.

[12]  B Langholz,et al.  Estimation of excess risk from case-control data using Aalen's linear regression model. , 1997, Biometrics.

[13]  O. Aalen Nonparametric Inference for a Family of Counting Processes , 1978 .

[14]  J Benichou,et al.  Methods of inference for estimates of absolute risk derived from population-based case-control studies. , 1995, Biometrics.

[15]  S Greenland,et al.  On the need for the rare disease assumption in case-control studies. , 1982, American journal of epidemiology.

[16]  N. Chatterjee,et al.  Absolute risk of endometrial carcinoma during 20-year follow-up among women with endometrial hyperplasia. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[17]  B Langholz,et al.  Estimation of absolute risk from nested case-control data. , 1997, Biometrics.

[18]  D. Oakes,et al.  Survival Times: Aspects of Partial Likelihood , 1981 .

[19]  Sander Greenland,et al.  Model-based estimation of relative risks and other epidemiologic measures in studies of common outcomes and in case-control studies. , 2004, American journal of epidemiology.