Survival analysis with incomplete genetic data

Genetic data are now collected frequently in clinical studies and epidemiological cohort studies. For a large study, it may be prohibitively expensive to genotype all study subjects, especially with the next-generation sequencing technology. Two-phase sampling, such as case-cohort and nested case-control sampling, is cost-effective in such settings but entails considerable analysis challenges, especially if efficient estimators are desired. Another type of missing data arises when the investigators are interested in the haplotypes or the genetic markers that are not on the genotyping platform used for the current study. Valid and efficient analysis of such missing data is also interesting and challenging. This article provides an overview of these issues and outlines some directions for future research.

[1]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[2]  D. Zeng,et al.  Likelihood-Based Inference on Haplotype Effects in Genetic Association Studies , 2006 .

[3]  D Zeng,et al.  A general framework for studying genetic effects and gene-environment interactions with missing data. , 2010, Biostatistics.

[4]  D. Lin,et al.  Haplotype‐based association analysis in cohort studies of unrelated individuals , 2004, Genetic epidemiology.

[5]  A. Folsom,et al.  The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. , 1989, American journal of epidemiology.

[6]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[7]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[8]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[9]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[10]  Garnet L Anderson,et al.  The Women's Health Initiative: Rationale, Design and Progress Report , 1999 .

[11]  A. W. van der Vaart,et al.  On Profile Likelihood , 2000 .

[12]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[13]  D Zeng,et al.  Efficient semiparametric estimation of haplotype-disease associations in case-cohort and nested case-control studies. , 2005, Biostatistics.

[14]  R. Kronmal,et al.  The Cardiovascular Health Study: design and rationale. , 1991, Annals of epidemiology.

[15]  F. D. K. Liddell,et al.  Methods of Cohort Analysis : Appraisal by Application to Asbestos Mining , 1977 .

[16]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data: Kalbfleisch/The Statistical , 2002 .

[17]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[18]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.