Outcome-Dependent Sampling: An Efficient Sampling and Inference Procedure for Studies With a Continuous Outcome

To characterize the relation between an exposure and a continuous outcome, the sampling of subjects can be done much as it is in a case–control study, such that the sample is enriched with subjects who are especially informative. In an outcome-dependent sampling design, observations made on a judiciously chosen subset of the base population can provide nearly the same statistical efficiency as observing the entire base population. Reaping the benefits of such sampling, however, requires use of an analysis that accounts for the outcome-dependent sampling. In this report, we examine the statistical efficiency of a plain random sample analyzed with standard methods, compared with that of data collected with outcome-dependent sampling and analyzed by either of 2 appropriate methods. In addition, 3 real datasets were analyzed using an outcome-dependent sampling approach. The results demonstrate the improved statistical efficiency obtained by using an outcome-dependent sampling, and its applicability in a wide range of settings. This design, coupled with an appropriate analysis, offers a cost-efficient approach to studying the determinants of a continuous outcome.

[1]  S. Cosslett,et al.  Maximum likelihood estimator for choice-based samples , 1981 .

[2]  A. Folsom,et al.  Interaction of the glutathione S-transferase genes and cigarette smoking on risk of lower extremity arterial disease: the Atherosclerosis Risk in Communities (ARIC) study. , 2001, Atherosclerosis.

[3]  Haibo Zhou,et al.  Prenatal exposure to low-level polychlorinated biphenyls in relation to mental and motor development at 8 months. , 2003, American journal of epidemiology.

[4]  Haibo Zhou,et al.  Semiparametric methods for data from an outcome-dependent sampling scheme , 2007 .

[5]  J. Salonen,et al.  Low Plasma Lycopene Concentration Is Associated With Increased Intima-Media Thickness of the Carotid Artery Wall , 2000, Arteriosclerosis, thrombosis, and vascular biology.

[6]  Kaipillil Vijayan,et al.  Optimal Estimation for Response-Dependent Retrospective Sampling , 1996 .

[7]  C R Weinberg,et al.  Flexible maximum likelihood methods for assessing joint effects in case-control studies with complex sampling. , 1994, Biometrics.

[8]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[9]  J. Anderson Separate sample logistic discrimination , 1972 .

[10]  Haibo Zhou,et al.  A Semiparametric Empirical Likelihood Method for Biased Sampling Schemes with Auxiliary Covariates , 2006, Biometrics.

[11]  S Greenland,et al.  Analytic methods for two-stage case-control studies and other stratified designs. , 1991, Statistics in medicine.

[12]  J. Qin,et al.  A goodness-of-fit test for logistic regression models based on case-control data , 1997 .

[13]  G. Imbens,et al.  Efficient estimation and stratified sampling , 1996 .

[14]  Haibo Zhou,et al.  A Semiparametric Empirical Likelihood Method for Data from an Outcome‐Dependent Sampling Scheme with a Continuous Outcome , 2002, Biometrics.

[15]  R. G. Jarrett,et al.  Bounds and expansions for Fisher information when the moments are known , 1984 .

[16]  Haibo Zhou,et al.  An Estimated Likelihood Method for Continuous Outcome Regression Models With Outcome-Dependent Sampling , 2005 .

[17]  L P Zhao,et al.  Designs and analysis of two-stage studies. , 1992, Statistics in medicine.

[18]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[19]  A. Folsom,et al.  Association of serum vitamin levels, LDL susceptibility to oxidation, and autoantibodies against MDA-LDL with carotid atherosclerosis. A case-control study. The ARIC Study Investigators. Atherosclerosis Risk in Communities. , 1997, Arteriosclerosis, thrombosis, and vascular biology.

[20]  S. Suissa,et al.  Binary methods for continuous outcomes: a parametric alternative. , 1991, Journal of clinical epidemiology.

[21]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[22]  J. Cornfield,et al.  A method of estimating comparative rates from clinical data; applications to cancer of the lung, breast, and cervix. , 1951, Journal of the National Cancer Institute.

[23]  J E White,et al.  A two stage design for the study of the relationship between a rare exposure and a rare disease. , 1982, American journal of epidemiology.

[24]  D. Holt,et al.  Regression Analysis of Data from Complex Surveys , 1980 .

[25]  Norman E. Breslow,et al.  Logistic regression for two-stage case-control data , 1988 .

[26]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[27]  F. Speizer,et al.  Lead and hypertension in a sample of middle-aged women. , 1999, American journal of public health.