Haplotype frequency estimation in patient populations: The effect of departures from Hardy‐Weinberg proportions and collapsing over a locus in the HLA region

Haplotype analyses are an important area in the study of the genetic components of human disease. Associations between markers and disease loci that are not evident with a single marker locus may be identified in multi‐locus marker analyses using estimated haplotype frequencies (HFs). Procedures that make use of the expectation‐maximization (EM) algorithm to estimate HFs from unphased genotype data are in common use in genetic studies. The EM algorithm uses these unphased genotype frequencies along with the assumption of Hardy‐Weinberg proportions (HWP) to converge on HF estimates. In this paper, we assess the accuracy of EM estimates of HFs in patients with type I diabetes for whom the true haplotypes are known, but the data are analyzed ignoring family information to allow comparison between estimated and true frequencies. The data consist of six HLA loci with high levels of polymorphism and a range of departures from HWP and linkage equilibrium. While the overall accuracy of the EM estimates is good, there can be large over‐ and underestimates of particular HFs, even for common haplotypes, especially when the loci involved deviate significantly from HWP. Estimating HFs for three or more loci and then collapsing over loci so as to generate two locus haplotypes can improve the accuracy of the estimation. The collapsing procedure is most beneficial when one of the loci in the two‐locus haplotype of interest deviates significantly from HWP and the locus collapsed over is in linkage disequilibrium with the other loci. Genet. Epidemiol. 22:186–195, 2002. © 2002 Wiley‐Liss, Inc.

[1]  H. Erlich,et al.  The HLA class II locus DPB1 can influence susceptibility to type 1 diabetes. , 2000, Diabetes.

[2]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[3]  G. Thomson Mapping disease genes: family-based association studies. , 1995, American journal of human genetics.

[4]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[5]  J. Todd,et al.  Strategies in complex disease mapping. , 2000, Current opinion in genetics & development.

[6]  A. Valdés,et al.  Detecting disease-predisposing variants: the haplotype method. , 1997, American journal of human genetics.

[7]  G. Thomson Analysis of complex human genetic traits: an ordered-notation method and new tests for mode of inheritance. , 1995, American journal of human genetics.

[8]  M. Oudshoorn,et al.  Validation of haplotype frequency estimation methods. , 1998, Human immunology.

[9]  P. Hedrick,et al.  Gametic disequilibrium measures: proceed with caution. , 1987, Genetics.

[10]  E. Thompson,et al.  Performing the exact test of Hardy-Weinberg proportion for multiple alleles. , 1992, Biometrics.

[11]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[12]  K K Kidd,et al.  The accuracy of statistical methods for estimation of haplotype frequencies: an example from the CD4 locus. , 2000, American journal of human genetics.

[13]  W. Klitz,et al.  Association mapping of disease loci, by use of a pooled DNA genomic screen. , 1997, American journal of human genetics.

[14]  J. Todd Genetic analysis of type 1 diabetes using whole genome approaches. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  N. Schork,et al.  Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. , 2000, American journal of human genetics.

[17]  S. McWeeney,et al.  Evidence for linkage and association to alcohol dependence on chromosome 19 , 1999, Genetic epidemiology.

[18]  H. Erlich,et al.  The HLA Class II Locus DPB 1 Can Influence Susceptibility to Type 1 Diabetes , 1999 .

[19]  H. Erlich,et al.  Association between type 1 diabetes age of onset and HLA among sibling pairs. , 1999, Diabetes.