Study Design Two-Phase, Generalized Case-Control Designs for the Study of Quantitative Longitudinal Outcomes

We propose a general class of 2-phase epidemiologic study designs for quantitative, longitudinal data that are useful when phase 1 longitudinal outcome and covariate data are available but data on the exposure (e.g., a biomarker) can only be collected on a subset of subjects during phase 2. To conduct a study using a design in the class, one first summarizes the longitudinal outcomes by fitting a simple linear regression of the response on a time-varying covariate for each subject. Sampling strata are defined by splitting the estimated regression intercept or slope distributions into distinct (low, medium, and high) regions. Stratified sampling is then conducted from strata defined by the intercepts, by the slopes, or from a mixture. In general, samples selected with extreme intercept values will yield low variances for associations of time-fixed exposures with the outcome and samples enriched with extreme slope values will yield low variances for associations of time-varying exposures with the outcome (including interactions with time-varying exposures). We describe ascertainment-corrected maximum likelihood and multiple-imputation estimation procedures that permit valid and efficient inferences. We embed all methodological developments within the framework of conducting a substudy that seeks to examine genetic associations with lung function among continuous smokers in the Lung Health Study (United States and Canada, 1986–1994).

[1]  Patrick J Heagerty,et al.  Likelihood-based analysis of outcome-dependent sampling designs with longitudinal data. , 2018, Statistics in medicine.

[2]  Jonathan S Schildcrout,et al.  Outcome-related, Auxiliary Variable Sampling Designs for Longitudinal Binary Data , 2018, Epidemiology.

[3]  Patrick J Heagerty,et al.  Extending the Case–Control Design to Longitudinal Data: Stratified Sampling Based on Repeated Binary Outcomes , 2018, Epidemiology.

[4]  Bhramar Mukherjee,et al.  Exposure enriched outcome dependent designs for longitudinal studies of gene–environment interaction , 2017, Statistics in medicine.

[5]  Ran Tao,et al.  Efficient Semiparametric Inference Under Two-Phase Sampling, With Applications to Genetic Association Studies , 2017, Journal of the American Statistical Association.

[6]  J. Gagne,et al.  Performance of Disease Risk Score Matching in Nested Case-Control Studies: A Simulation Study. , 2016, American journal of epidemiology.

[7]  Patrick J Heagerty,et al.  BIASED SAMPLING DESIGNS TO IMPROVE RESEARCH EFFICIENCY: FACTORS INFLUENCING PULMONARY FUNCTION OVER TIME IN CHILDREN WITH ASTHMA. , 2015, The annals of applied statistics.

[8]  S. Haneuse,et al.  Strategies for monitoring and evaluation of resource-limited national antiretroviral therapy programs: the two-phase design , 2015, BMC Medical Research Methodology.

[9]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[10]  Patrick J Heagerty,et al.  Outcome Vector Dependent Sampling with Longitudinal Continuous Response Data: Stratified Sampling Based on Summary Statistics , 2013, Biometrics.

[11]  I. White,et al.  Review of inverse probability weighting for dealing with missing data , 2013, Statistical methods in medical research.

[12]  Toshiko Tanaka,et al.  Genome-wide study identifies two loci associated with lung function decline in mild to moderate COPD , 2012, Human Genetics.

[13]  N. McCarthy,et al.  The "case-chaos study" as an adjunct or alternative to conventional case-control study methodology. , 2012, American journal of epidemiology.

[14]  S. Bull,et al.  Strategies for genetic association analyses combining unrelated case-control individuals and family trios. , 2012, American journal of epidemiology.

[15]  E. Clayton,et al.  Operational Implementation of Prospective Genotyping for Personalized Medicine: The Design of the Vanderbilt PREDICT Project , 2012, Clinical pharmacology and therapeutics.

[16]  Jianwen Cai,et al.  Semiparametric inference for a 2-stage outcome-auxiliary-dependent sampling design with continuous outcome. , 2011, Biostatistics.

[17]  Patrick J Heagerty,et al.  Outcome-dependent sampling from existing cohorts with longitudinal binary response data: study planning and analysis. , 2011, Biometrics.

[18]  Alastair Scott,et al.  Efficient estimation in multi-phase case-control studies , 2010 .

[19]  T. Lumley Robustness of Semiparametric Efficiency in Nearly-Correct Models for Two-Phase Samples , 2017, 1707.05924.

[20]  David V Conti,et al.  Detecting gene-environment interactions using a combined case-only and case-control approach. , 2008, American journal of epidemiology.

[21]  Hans C van Houwelingen,et al.  Combining matched and unmatched control groups in case-control studies. , 2008, American journal of epidemiology.

[22]  Jon Wakefield,et al.  Overcoming ecologic bias using the two-phase study design. , 2008, American journal of epidemiology.

[23]  Haibo Zhou,et al.  Outcome-Dependent Sampling: An Efficient Sampling and Inference Procedure for Studies With a Continuous Outcome , 2007, Epidemiology.

[24]  J M Neuhaus,et al.  Family‐Specific Approaches to the Analysis of Case–Control Family Data , 2006, Biometrics.

[25]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[26]  E. Boyko,et al.  The millennium Cohort Study: a 21-year prospective cohort study of 140,000 military personnel. , 2002, Military medicine.

[27]  Haibo Zhou,et al.  A Semiparametric Empirical Likelihood Method for Data from an Outcome‐Dependent Sampling Scheme with a Continuous Outcome , 2002, Biometrics.

[28]  A. Scott,et al.  On the robustness of weighted methods for fitting models to case–control data , 2002 .

[29]  A. Buist,et al.  Effects of randomized assignment to a smoking cessation intervention and changes in smoking habits on respiratory symptoms in smokers with early chronic obstructive pulmonary disease: the Lung Health Study. , 1999, The American journal of medicine.

[30]  A. Scott,et al.  Fitting regression models to case-control data by maximum likelihood , 1997 .

[31]  N. Breslow,et al.  Statistics in Epidemiology : The Case-Control Study , 2008 .

[32]  W. Bailey,et al.  Effects of smoking intervention and the use of an inhaled anticholinergic bronchodilator on the rate of decline of FEV1. The Lung Health Study. , 1995, JAMA.

[33]  W. Bailey,et al.  Effects of Smoking Intervention and the Use of an Inhaled Anticholinergic Bronchodilator on the Rate of Decline of FEV1 , 1994 .

[34]  W. Bailey,et al.  Design of the Lung Health Study: a randomized clinical trial of early intervention for chronic obstructive pulmonary disease. , 1993, Controlled clinical trials.

[35]  J E White,et al.  A two stage design for the study of the relationship between a rare exposure and a rare disease. , 1982, American journal of epidemiology.

[36]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[37]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[38]  J. Anderson Separate sample logistic discrimination , 1972 .

[39]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .