HAPLOPOOL: improving haplotype frequency estimation through DNA pools and phylogenetic modeling

MOTIVATION The search for genetic variants that are linked to complex diseases such as cancer, Parkinson's;, or Alzheimer's; disease, may lead to better treatments. Since haplotypes can serve as proxies for hidden variants, one method of finding the linked variants is to look for case-control associations between the haplotypes and disease. Finding these associations requires a high-quality estimation of the haplotype frequencies in the population. To this end, we present, HaploPool, a method of estimating haplotype frequencies from blocks of consecutive SNPs. RESULTS HaploPool leverages the efficiency of DNA pools and estimates the population haplotype frequencies from pools of disjoint sets, each containing two or three unrelated individuals. We study the trade-off between pooling efficiency and accuracy of haplotype frequency estimates. For a fixed genotyping budget, HaploPool performs favorably on pools of two individuals as compared with a state-of-the-art non-pooled phasing method, PHASE. Of independent interest, HaploPool can be used to phase non-pooled genotype data with an accuracy approaching that of PHASE. We compared our algorithm to three programs that estimate haplotype frequencies from pooled data. HaploPool is an order of magnitude more efficient (at least six times faster), and considerably more accurate than previous methods. In contrast to previous methods, HaploPool performs well with missing data, genotyping errors and long haplotype blocks (of between 5 and 25 SNPs).

[1]  C. J-F,et al.  THE COALESCENT , 1980 .

[2]  W. Klitz,et al.  Association mapping of disease loci, by use of a pooled DNA genomic screen. , 1997, American journal of human genetics.

[3]  S. Germer,et al.  High-throughput SNP allele-frequency determination in pooled DNA samples by kinetic PCR. , 2000, Genome research.

[4]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[5]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[6]  D. Clayton,et al.  Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. , 2002, Annals of human genetics.

[7]  N. Kaplan,et al.  On the advantage of haplotype analysis in the presence of multiple disease susceptibility alleles , 2002, Genetic epidemiology.

[8]  Xu Peng,et al.  BMC Bioinformatics BioMed Central Methodology article SNP haplotype tagging from DNA pools of two individuals , 2002 .

[9]  M. O’Donovan,et al.  DNA Pooling: a tool for large-scale association studies , 2002, Nature Reviews Genetics.

[10]  G. Kirov,et al.  Universal, robust, highly quantitative SNP allele frequency measurement in DNA pools , 2002, Human Genetics.

[11]  P. Visscher,et al.  SNP genotyping on pooled DNAs: comparison of genotyping technologies and a semi automated method for data storage and analysis. , 2002, Nucleic acids research.

[12]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[13]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[14]  Jacques S. Beckmann,et al.  Resolution of haplotypes and haplotype frequencies from SNP genotypes of pooled samples , 2003, RECOMB '03.

[15]  J. Ott,et al.  Efficiency of single-nucleotide polymorphism haplotype estimation from pooled DNA , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  P. Tam The International HapMap Consortium. The International HapMap Project (Co-PI of Hong Kong Centre which responsible for 2.5% of genome) , 2003 .

[17]  Toshikazu Ito,et al.  Estimation of haplotype frequencies, linkage-disequilibrium measures, and combination of haplotype copies in each pool by use of pooled DNA data. , 2003, American journal of human genetics.

[18]  C. Carlson,et al.  Mapping complex disease loci in whole-genome association studies , 2004, Nature.

[19]  Richard M. Karp,et al.  Perfect phylogeny and haplotype assignment , 2004, RECOMB '04.

[20]  Eran Halperin,et al.  Haplotype reconstruction from genotype data using Imperfect Phylogeny , 2004, Bioinform..

[21]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[22]  Christopher A. Haiman,et al.  Transferability of Tag SNPs to Capture Common Genetic Variation in DNA Repair Genes Across Multiple Populations , 2005, Pacific Symposium on Biocomputing.

[23]  E. Halperin,et al.  Using DNA pools for genotyping trios , 2006, Nucleic acids research.