Regression-based association analysis with clustered haplotypes through use of genotypes.

Haplotype-based association analysis has been recognized as a tool with high resolution and potentially great power for identifying modest etiological effects of genes. However, in practice, its efficacy has not been as successfully reproduced as expected in theory. One primary cause is that such analysis tends to require a large number of parameters to capture the abundant haplotype varieties, and many of those are expended on rare haplotypes for which studies would have insufficient power to detect association even if it existed. To concentrate statistical power on more-relevant inferences, in this study, we developed a regression-based approach using clustered haplotypes to assess haplotype-phenotype association. Specifically, we generalized the probabilistic clustering methods of Tzeng to the generalized linear model (GLM) framework established by Schaid et al. The proposed method uses unphased genotypes and incorporates both phase uncertainty and clustering uncertainty. Its GLM framework allows adjustment of covariates and can model qualitative and quantitative traits. It can also evaluate the overall haplotype association or the individual haplotype effects. We applied the proposed approach to study the association between hypertriglyceridemia and the apolipoprotein A5 gene. Through simulation studies, we assessed the performance of the proposed approach and demonstrate its validity and power in testing for haplotype-trait association.

[1]  T. Meerman,et al.  Haplotype sharing analysis in affected individuals from nuclear families with at least one affected offspring , 1997 .

[2]  P. Cullen Evidence that triglycerides are an independent coronary heart disease risk factor. , 2000, The American journal of cardiology.

[3]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[4]  P. Sham,et al.  The future of association studies: gene-based analysis and replication. , 2004, American journal of human genetics.

[5]  D. Schaid Evaluating associations of haplotypes with traits , 2004, Genetic epidemiology.

[6]  K. Crandall,et al.  Empirical tests of some predictions from coalescent theory with applications to intraspecific phylogeny reconstruction. , 1993, Genetics.

[7]  Daniel O. Stram,et al.  Modeling and E-M Estimation of Haplotype-Specific Relative Risks from Genotype Data for a Case-Control Study of Unrelated Individuals , 2003, Human Heredity.

[8]  K. Alberti,et al.  Variable effects of the APOC3 –482C > T variant on insulin, glucose and triglyceride concentrations in different ethnic groups , 2001, Diabetologia.

[9]  S. Humphries,et al.  Associations of genotypes at the apolipoprotein AI‐CIII‐AIV, apolipoprotein B and lipoprotein lipase gene loci with coronary atherosclerosis and high density lipoprotein subclasses , 1994, Clinical genetics.

[10]  E. Génin,et al.  Use of closely related affected individuals for the genetic study of complex diseases in founder populations. , 2001, American journal of human genetics.

[11]  P. Sham Statistics in human genetics , 1997 .

[12]  Jason Cooper,et al.  Use of unphased multilocus genotype data in indirect association studies , 2004, Genetic epidemiology.

[13]  Lue Ping Zhao,et al.  A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case-control studies. , 2003, American journal of human genetics.

[14]  N. Laird,et al.  Estimation and Tests of Haplotype-Environment Interaction when Linkage Phase Is Ambiguous , 2003, Human Heredity.

[15]  M. Slatkin,et al.  Estimating the age of alleles by use of intraallelic variability. , 1997, American journal of human genetics.

[16]  Peter H. Westfall,et al.  Testing Association of Statistically Inferred Haplotypes with Discrete and Continuous Traits in Samples of Unrelated Individuals , 2002, Human Heredity.

[17]  Jonathan C. Cohen,et al.  An Apolipoprotein Influencing Triglycerides in Humans and Mice Revealed by Comparative Sequencing , 2001, Science.

[18]  L. Wasserman,et al.  On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. , 2003, American journal of human genetics.

[19]  Larry Wasserman,et al.  Outlier Detection and False Discovery Rates for Whole-Genome DNA Matching , 2003 .

[20]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[21]  D. Lin,et al.  Haplotype‐based association analysis in cohort studies of unrelated individuals , 2004, Genetic epidemiology.

[22]  D. Zeng,et al.  Estimating haplotype‐disease associations with pooled genotype data , 2005, Genetic epidemiology.

[23]  M. McPeek,et al.  Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. , 1999, American journal of human genetics.

[24]  E. Génin,et al.  Search for multifactorial disease susceptibility genes in founder populations , 2000, Annals of human genetics.

[25]  P. Marjoram,et al.  Fine-scale mapping of disease genes with multiple mutations via spatial clustering techniques. , 2003, American journal of human genetics.

[26]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[27]  D. Boos On Generalized Score Tests , 1992 .

[28]  J. Kent Robust properties of likelihood ratio tests , 1982 .

[29]  Andrew P Morris,et al.  Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. , 2004, American journal of human genetics.

[30]  Leena Peltonen,et al.  Dissecting Human Disease in the Postgenomic Era , 2001, Science.

[31]  Jung-Ying Tzeng,et al.  Evolutionary‐based grouping of haplotypes in association analysis , 2005, Genetic epidemiology.

[32]  K. Roeder,et al.  Evolutionary‐based association analysis using haplotype data , 2003 .

[33]  J. Wall,et al.  Assessing the performance of the haplotype block model of linkage disequilibrium. , 2003, American journal of human genetics.

[34]  P. Talmud,et al.  Contribution of Apolipoprotein C-III Gene Variants to Determination of Triglyceride Levels and Interaction With Smoking in Middle-Aged Men , 2000, Arteriosclerosis, thrombosis, and vascular biology.

[35]  E. Lander,et al.  Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease , 2003, Nature Genetics.

[36]  K. Chien,et al.  A novel genetic variant in the apolipoprotein A5 gene is associated with hypertriglyceridemia. , 2003, Human molecular genetics.

[37]  C Charles Gu,et al.  Genetic association mapping under founder heterogeneity via weighted haplotype similarity analysis in candidate genes , 2004, Genetic epidemiology.

[38]  J. Buring,et al.  Fasting triglycerides, high-density lipoprotein, and risk of myocardial infarction. , 1997, Circulation.

[39]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[40]  H. Hein,et al.  Triglyceride concentration and ischemic heart disease: an eight-year follow-up in the Copenhagen Male Study. , 1998, Circulation.

[41]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[42]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[43]  D. Schaid,et al.  Score tests for association between traits and haplotypes when linkage phase is ambiguous. , 2002, American journal of human genetics.

[44]  A. von Eckardstein,et al.  Hypertriglyceridemia and elevated lipoprotein(a) are risk factors for major coronary events in middle-aged men. , 1996, The American journal of cardiology.

[45]  P. Wilson,et al.  Restriction fragment length polymorphisms of the apolipoprotein A-I, C-III, A-IV gene locus. Relationships with lipids, apolipoproteins, and premature coronary artery disease. , 1991, Atherosclerosis.

[46]  Francis S. Collins,et al.  Variations on a Theme: Cataloging Human DNA Sequence Variation , 1997, Science.

[47]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[48]  E. Génin,et al.  Missing data in haplotype analysis: a study on the MILC method. , 2002 .

[49]  J. Stephens,et al.  Haplotype Variation and Linkage Disequilibrium in 313 Human Genes , 2001, Science.

[50]  G. Satten,et al.  Inference on haplotype effects in case-control studies using unphased genotype data. , 2003, American journal of human genetics.

[51]  G. Satten,et al.  Comparison of prospective and retrospective methods for haplotype inference in case‐control studies , 2004, Genetic epidemiology.

[52]  John Molitor,et al.  Application of Bayesian spatial statistical methods to analysis of haplotypes effects and gene mapping , 2003, Genetic epidemiology.

[53]  K. Roeder,et al.  Transmission/disequilibrium test meets measured haplotype analysis: family-based association analysis guided by evolution of haplotypes. , 2001, American journal of human genetics.

[54]  Kathryn Roeder,et al.  Analysis of single‐locus tests to detect gene/disease associations , 2005, Genetic epidemiology.

[55]  Alessandro Rinaldo,et al.  Characterization of multilocus linkage disequilibrium , 2005, Genetic epidemiology.