Haplotype‐based association analysis in cohort studies of unrelated individuals

Exploring the associations between haplotypes and disease phenotypes is an important step toward the discovery of genes that influence complex human diseases. When unrelated subjects are sampled, haplotypes are often ambiguous because of the unknown gametic phase of the measured sites along a chromosome. We consider cohort studies of unrelated subjects which collect data on potentially censored ages of onset of disease along with unphased genotypes and possibly time‐varying environmental factors. We formulate the effects of haplotypes and environmental variables on the time to disease occurrence through a semiparametric Cox proportional hazards model, which can accommodate a variety of genetic mechanisms as well as gene‐environment interactions. We develop a simple and fast expectation‐maximization algorithm to maximize the likelihood for the relative risks and other parameters based on the observable data of unphased genotypes and potentially censored ages of onset. The resultant estimators are consistent, efficient, and asymptotically normal. Simulation studies show that, for practical situations, the parameter estimators are virtually unbiased, the association tests maintain type I errors near nominal levels, the confidence intervals have proper coverage probabilities, and the efficiency loss due to unknown gametic phase is small. © 2004 Wiley‐Liss, Inc.

[1]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[2]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[3]  Analysis of lipoprotein lipase haplotypes reveals associations not apparent from analysis of the constituent loci. , 1999, Annals of human genetics.

[4]  Z. Ying,et al.  Checking the Cox model with cumulative sums of martingale-based residuals , 1993 .

[5]  A. Vaart Efficiency. of infinite dimensional M- estimators , 1995 .

[6]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[7]  Daniel O. Stram,et al.  Modeling and E-M Estimation of Haplotype-Specific Relative Risks from Genotype Data for a Case-Control Study of Unrelated Individuals , 2003, Human Heredity.

[8]  Emil Spjøtvoll,et al.  Discussion of Paper by D.R. Cox , 1984 .

[9]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[10]  G. Satten,et al.  Inference on haplotype effects in case-control studies using unphased genotype data. , 2003, American journal of human genetics.

[11]  D. Cox Regression Models and Life-Tables , 1972 .

[12]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[13]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[14]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[15]  D. Clayton,et al.  A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. , 2002, American journal of human genetics.

[16]  F. Scholz Maximum Likelihood Estimation , 2006 .

[17]  Stephen E. Fienberg,et al.  A Celebration of Statistics , 1985 .

[18]  Clive Osmond,et al.  Statistical Methods in Cancer Research, Vol. 2, the Design and Analysis of Cohort Studies , 1990 .

[19]  Lue Ping Zhao,et al.  A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case-control studies. , 2003, American journal of human genetics.

[20]  H. Akaike Prediction and Entropy , 1985 .

[21]  N. Schork,et al.  Genetic analysis of case/control data using estimated haplotype frequencies: application to APOE locus variation and Alzheimer's disease. , 2001, Genome research.

[22]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[23]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[24]  Zhaohui S. Qin,et al.  Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[25]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[26]  Garnet L Anderson,et al.  The Women's Health Initiative: Rationale, Design and Progress Report , 1999 .

[27]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[28]  C. Nusbaum,et al.  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. , 1998, Science.

[29]  A. Folsom,et al.  The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. , 1989, American journal of epidemiology.

[30]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[31]  D C Rao,et al.  NHLBI Family Heart Study: objectives and design. , 1996, American journal of epidemiology.

[32]  D. Botstein,et al.  Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease , 2003, Nature Genetics.

[33]  Jurg Ott,et al.  Handbook of Human Genetic Linkage , 1994 .

[34]  B. Weir Genetic Data Analysis II. , 1997 .

[35]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[36]  E. Boerwinkle,et al.  Analysis of lipoprotein lipase haplotypes reveals associations not apparent from analysis of the constituent loci , 1999 .

[37]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[38]  Peter H. Westfall,et al.  Testing Association of Statistically Inferred Haplotypes with Discrete and Continuous Traits in Samples of Unrelated Individuals , 2002, Human Heredity.

[39]  R S Judson,et al.  Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[40]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[41]  N. Kaplan,et al.  On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles , 2002, Genetic epidemiology.

[42]  K K Kidd,et al.  Comparisons of two methods for haplotype reconstruction and haplotype frequency estimation from population data. , 2001, American journal of human genetics.

[43]  F. D. K. Liddell,et al.  Methods of Cohort Analysis : Appraisal by Application to Asbestos Mining , 1977 .

[44]  P. Sham,et al.  Model-Free Analysis and Permutation Tests for Allelic Associations , 1999, Human Heredity.

[45]  D. Schaid,et al.  Score tests for association between traits and haplotypes when linkage phase is ambiguous. , 2002, American journal of human genetics.

[46]  R. Kronmal,et al.  The Cardiovascular Health Study: design and rationale. , 1991, Annals of epidemiology.

[47]  M. Xiong,et al.  Haplotypes vs single marker linkage disequilibrium tests: what do we gain? , 2001, European Journal of Human Genetics.

[48]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[49]  N. Breslow,et al.  Statistical methods in cancer research. Volume II--The design and analysis of cohort studies. , 1987, IARC scientific publications.

[50]  James L. Winkler,et al.  Accessing Genetic Information with High-Density DNA Arrays , 1996, Science.

[51]  L. J. Wei,et al.  The Robust Inference for the Cox Proportional Hazards Model , 1989 .