Effectiveness of computational methods in haplotype prediction

Abstract. Haplotype analysis has been used for narrowing down the location of disease-susceptibility genes and for investigating many population processes. Computational algorithms have been developed to estimate haplotype frequencies and to predict haplotype phases from genotype data for unrelated individuals. However, the accuracy of such computational methods needs to be evaluated before their applications can be advocated. We have experimentally determined the haplotypes at two loci, the N-acetyltransferase 2 gene (NAT2, 850 bp, n=81) and a 140-kb region on chromosome X (n=77), each consisting of five single nucleotide polymorphisms (SNPs). We empirically evaluated and compared the accuracy of the subtraction method, the expectation-maximisation (EM) method, and the PHASE method in haplotype frequency estimation and in haplotype phase prediction. Where there was near complete linkage disequilibrium (LD) between SNPs (the NAT2 gene), all three methods provided effective and accurate estimates for haplotype frequencies and individual haplotype phases. For a genomic region in which marked LD was not maintained (the chromosome X locus), the computational methods were adequate in estimating overall haplotype frequencies. However, none of the methods was accurate in predicting individual haplotype phases. The EM and the PHASE methods provided better estimates for overall haplotype frequencies than the subtraction method for both genomic regions.

[1]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[2]  L Tiret,et al.  Sequence diversity in 36 candidate genes for cardiovascular disorders. , 1999, American journal of human genetics.

[3]  K K Kidd,et al.  Comparisons of two methods for haplotype reconstruction and haplotype frequency estimation from population data. , 2001, American journal of human genetics.

[4]  W. Cookson,et al.  Single nucleotide polymorphism and linkage disequilibrium within the TCR alpha/delta locus. , 2000, Human molecular genetics.

[5]  R Judson,et al.  The predictive power of haplotypes in clinical response. , 2000, Pharmacogenomics.

[6]  Peter Donnelly,et al.  Reply to Zhang et al. , 2001 .

[7]  E. Boerwinkle,et al.  DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene , 1998, Nature Genetics.

[8]  S. Tishkoff,et al.  Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range PCR. , 1996, Nucleic acids research.

[9]  C. Sing,et al.  A cladistic analysis of phenotype associations with haplotypes inferred from restriction endonuclease mapping. II. The analysis of natural populations. , 1988, Genetics.

[10]  U Landegren,et al.  A ligase-mediated gene detection technique. , 1988, Science.

[11]  L. Kruglyak Prospects for whole-genome linkage disequilibrium mapping of common disease genes , 1999, Nature Genetics.

[12]  G. Abecasis,et al.  Single nucleotide polymorphism and linkage disequilibrium within the TCR α/δ locus , 2000 .

[13]  K. Kidd,et al.  Direct haplotyping of chromosomal segments from multiple heterozygotes via allele-specific PCR amplification. , 1989, Nucleic acids research.

[14]  M Farrall,et al.  Measured haplotype analysis of the angiotensin-I converting enzyme gene. , 1998, Human molecular genetics.

[15]  D G Clayton,et al.  Fine genetic mapping using haplotype analysis and the missing data problem , 1998, Annals of human genetics.

[16]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[17]  J. Witte,et al.  Linkage disequilibrium and allele-frequency distributions for 114 single-nucleotide polymorphisms in five populations. , 2000, American journal of human genetics.

[18]  E. Winn-Deen,et al.  High-density multiplex detection of nucleic acid sequences: oligonucleotide ligation assay and sequence-coded separation. , 1994, Nucleic acids research.

[19]  A. Templeton USES OF EVOLUTIONARY THEORY IN THE HUMAN GENOME PROJECT , 1999 .

[20]  J. Benítez,et al.  Identification and prevalence study of 17 allelic variants of the human NAT2 gene in a white population. , 1996, Pharmacogenetics.

[21]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[22]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[23]  K K Kidd,et al.  Haplotype of multiple polymorphisms resolved by enzymatic amplification of single DNA molecules. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[24]  M. Boehnke,et al.  Loss of information due to ambiguous haplotyping of SNPs , 1999, Nature Genetics.

[25]  E. Lander,et al.  Characterization of single-nucleotide polymorphisms in coding regions of human genes , 1999 .

[26]  K K Kidd,et al.  The accuracy of statistical methods for estimation of haplotype frequencies: an example from the CD4 locus. , 2000, American journal of human genetics.

[27]  N. Schork,et al.  Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. , 2000, American journal of human genetics.

[28]  J. Hoh,et al.  A Measure of Phase Ambiguity in Pairs of SNPs in the Presence ofLinkage Disequi librium , 2000, Human Heredity.

[29]  Jonathan Scott Friedlaender,et al.  Haplotypes and linkage disequilibrium at the phenylalanine hydroxylase locus, PAH, in a global representation of populations. , 2000, American journal of human genetics.

[30]  R. Lewontin The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. , 1964, Genetics.

[31]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[32]  A. von Haeseler,et al.  A coalescent approach to study linkage disequilibrium between single-nucleotide polymorphisms. , 2000, American journal of human genetics.

[33]  J. Gilbert,et al.  Analysis of association at single nucleotide polymorphisms in the APOE region. , 2000, Genomics.

[34]  E. Boerwinkle,et al.  Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. , 1998, American journal of human genetics.