A comparison of phasing algorithms for trios and unrelated individuals.

Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million-SNP HapMap data set. Finally, we evaluated methods of estimating the value of r(2) between a pair of SNPs and concluded that all methods estimated r(2) well when the estimated value was >or=0.8.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  D. C. Rao,et al.  Estimation of myriad haplotype frequencies , 1985, Genetic epidemiology.

[3]  R. Lewontin,et al.  On measures of gametic disequilibrium. , 1988, Genetics.

[4]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[5]  T. Matise,et al.  Identity-by-descent and association mapping of a recessive gene for Hirschsprung disease on human chromosome 13q22. , 1994, Human molecular genetics.

[6]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[7]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[8]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[9]  D G Clayton,et al.  Fine genetic mapping using haplotype analysis and the missing data problem , 1998, Annals of human genetics.

[10]  Dan Gusfield,et al.  A Practical Algorithm for Optimal Inference of Haplotypes from Diploid Populations , 2000, ISMB.

[11]  N. Schork,et al.  Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. , 2000, American journal of human genetics.

[12]  Dan Gusfield,et al.  Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms , 2001, J. Comput. Biol..

[13]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M. Xiong,et al.  Haplotypes vs single marker linkage disequilibrium tests: what do we gain? , 2001, European Journal of Human Genetics.

[15]  L. Lazzeroni,et al.  A chronology of fine-scale gene mapping by linkage disequilibrium , 2001, Statistical methods in medical research.

[16]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[17]  Mourad Sahbatou,et al.  Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease , 2001, Nature.

[18]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[19]  Sinead B. O'Leary,et al.  Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease , 2001, Nature Genetics.

[20]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[21]  Jeffrey Ross-Ibarra,et al.  Genetic Data Analysis II. Methods for Discrete Population Genentic Data , 2002 .

[22]  A. Chakravarti,et al.  Haplotype inference in random population samples. , 2002, American journal of human genetics.

[23]  Zhaohui S. Qin,et al.  Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[24]  Dan Gusfield,et al.  Haplotyping as perfect phylogeny: conceptual framework and efficient solutions , 2002, RECOMB '02.

[25]  Pardis C Sabeti,et al.  Detecting recent positive selection in the human genome from haplotype structure , 2002, Nature.

[26]  R. Karp,et al.  Efficient reconstruction of haplotype structure via perfect phylogeny. , 2002, Journal of bioinformatics and computational biology.

[27]  G. McVean,et al.  Estimating recombination rates from population-genetic data , 2003, Nature Reviews Genetics.

[28]  M. The sampling theory of neutral alleles and an urn model in population genetics * , 2003 .

[29]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[30]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[31]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[32]  R. Griffiths,et al.  Bounds on the minimum number of recombination events in a sample history. , 2003, Genetics.

[33]  M. De Iorio,et al.  Importance sampling on coalescent histories. I , 2004, Advances in Applied Probability.

[34]  P. Donnelly,et al.  The Fine-Scale Structure of Recombination Rate Variation in the Human Genome , 2004, Science.

[35]  A. Chakravarti,et al.  Haplotype and missing data inference in nuclear families. , 2004, Genome research.

[36]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[37]  A. Morris,et al.  Little loss of information due to unknown phase for fine-scale linkage-disequilibrium mapping with single-nucleotide-polymorphism genotype data. , 2004, American journal of human genetics.

[38]  Eran Halperin,et al.  Haplotype reconstruction from genotype data using Imperfect Phylogeny , 2004, Bioinform..

[39]  Aravinda Chakravarti,et al.  Exhaustive allelic transmission disequilibrium tests as a new approach to genome-wide association studies , 2004, Nature Genetics.

[40]  M. De Iorio,et al.  Importance sampling on coalescent histories. II: Subdivided population models , 2004, Advances in Applied Probability.

[41]  M. Stephens,et al.  Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-data Imputation , 2022 .

[42]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[43]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[44]  Sebastian Zöllner,et al.  Coalescent-Based Association Mapping and Fine Mapping of Complex Trait Loci , 2005, Genetics.

[45]  Hongyu Zhao,et al.  HAPLORE: a program for haplotype reconstruction in general pedigrees without recombination , 2005, Bioinform..

[46]  Peter Kraft,et al.  Accounting for haplotype uncertainty in matched association studies: A comparison of simple and flexible techniques , 2005, Genetic epidemiology.

[47]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[48]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[49]  Jennifer Wessel,et al.  A comprehensive literature review of haplotyping software and methods for use with unrelated individuals , 2005, Human Genomics.

[50]  G. Abecasis,et al.  Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers. , 2005, American journal of human genetics.

[51]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.