Estimating Haplotype Effects on Dichotomous Outcome for Unphased Genotype Data Using a Weighted Penalized Log-Likelihood Approach

Objective: To develop a method to estimate haplotype effects on dichotomous outcomes when phase is unknown, that can also estimate reliable effects of rare haplotypes. Methods: In short, the method uses a logistic regression approach, with weights attached to all possible haplotype combinations of an individual. An EM-algorithm was used: in the E-step the weights are estimated, and the M-step consists of maximizing the joint log-likelihood. When rare haplotypes were present, a penalty function was introduced. We compared four different penalties. To investigate statistical properties of our method, we performed a simulation study for different scenarios. The evaluation criteria are the mean bias of the parameter estimates, the root of the mean squared error, the coverage probability, power, Type I error rate and the false discovery rate. Results: For the unpenalized approach, mean bias was small, coverage probabilities were approximately 95%, power ranged from 15.2 to 44.7% depending on haplotype frequency, and Type I error rate was around 5%. All penalty functions reduced the standard errors of the rare haplotypes, but introduced bias. This trade-off decreased power. Conclusion: The unpenalized weighted log-likelihood approach performs well. A penalty function can help to estimate an effect for rare haplotypes.

[1]  D. Schaid Evaluating associations of haplotypes with traits , 2004, Genetic epidemiology.

[2]  Daniel O. Stram,et al.  Modeling and E-M Estimation of Haplotype-Specific Relative Risks from Genotype Data for a Case-Control Study of Unrelated Individuals , 2003, Human Heredity.

[3]  D G Clayton,et al.  Fine genetic mapping using haplotype analysis and the missing data problem , 1998, Annals of human genetics.

[4]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[5]  Andrew P Morris,et al.  Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. , 2004, American journal of human genetics.

[6]  N. Schork,et al.  Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. , 2000, American journal of human genetics.

[7]  C Charles Gu,et al.  Genetic association mapping under founder heterogeneity via weighted haplotype similarity analysis in candidate genes , 2004, Genetic epidemiology.

[8]  P. Sham,et al.  Haplotype Association Analysis of Discrete and Continuous Traits Using Mixture of Regression Models , 2004, Behavior genetics.

[9]  G. Satten,et al.  Inference on haplotype effects in case-control studies using unphased genotype data. , 2003, American journal of human genetics.

[10]  D. Tregouet,et al.  A new algorithm for haplotype‐based association analysis: the Stochastic‐EM algorithm , 2004, Annals of human genetics.

[11]  E. Boerwinkle,et al.  A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. I. Basic theory and an analysis of alcohol dehydrogenase activity in Drosophila. , 1987, Genetics.

[12]  F. Clerget-Darpoux,et al.  Association in Multifactorial Traits: How to Deal with Rare Observations? , 2005, Human Heredity.

[13]  K. Roeder,et al.  Evolutionary‐based association analysis using haplotype data , 2003 .

[14]  Peter H. Westfall,et al.  Testing Association of Statistically Inferred Haplotypes with Discrete and Continuous Traits in Samples of Unrelated Individuals , 2002, Human Heredity.

[15]  A. Zwinderman,et al.  Haplotype analysis of the CETP gene: not TaqIB, but the closely linked -629C-->A polymorphism and a novel promoter variant are independently associated with CETP concentration. , 2003, Human molecular genetics.

[16]  A. Zwinderman,et al.  Estimation of Multilocus Haplotype Effects Using Weighted Penalised Log‐Likelihood: Analysis of Five Sequence Variations at the Cholesteryl Ester Transfer Protein Gene Locus , 2003, Annals of human genetics.

[17]  Kathryn Roeder,et al.  Evolutionary‐based association analysis using haplotype data , 2003, Genetic epidemiology.

[18]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  T. Niu Algorithms for inferring haplotypes , 2004, Genetic epidemiology.

[23]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[24]  T. A. Warm Weighted likelihood estimation of ability in item response theory , 1989 .

[25]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[26]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[27]  D. Schaid,et al.  Score tests for association between traits and haplotypes when linkage phase is ambiguous. , 2002, American journal of human genetics.