Gametic phase estimation over large genomic regions using an adaptive window approach

The authors present ELB, an easy to programme and computationally fast algorithm for inferring gametic phase in population samples of multilocus genotypes. Phase updates are made on the basis of a window of neighbouring loci, and the window size varies according to the local level of linkage disequilibrium. Thus, ELB is particularly well suited to problems involving many loci and/or relatively large genomic regions, including those with variable recombination rate. The authors have simulated population samples of single nucleotide polymorphism genotypes with varying levels of recombination and marker density, and find that ELB provides better local estimation of gametic phase than the PHASE or HTYPER programs, while its global accuracy is broadly similar. The relative improvement in local accuracy increases both with increasing recombination and with increasing marker density. Short tandem repeat (STR, or microsatellite) simulation studies demonstrate ELB's superiority over PHASE both globally and locally. Missing data are handled by ELB; simulations show that phase recovery is virtually unaffected by up to 2 per cent of missing data, but that phase estimation is noticeably impaired beyond this amount. The authors also applied ELB to datasets obtained from random pairings of 42 human X chromosomes typed at 97 diallelic markers in a 200 kb low-recombination region. Once again, they found ELB to have consistently better local accuracy than PHASE or HTYPER, while its global accuracy was close to the best.

[1]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[2]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[3]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[4]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[5]  H. Ostrer,et al.  Dating the origin of the CCR5-Delta32 AIDS-resistance allele by the coalescence of haplotypes. , 1998, American journal of human genetics.

[6]  Hongyu Zhao,et al.  A global survey of haplotype frequencies and linkage disequilibrium at the DRD2 locus , 1998, Human Genetics.

[7]  C. Lewis,et al.  DNA variation in a 5-Mb region of the X chromosome and estimates of sex-specific/type-specific mutation rates. , 1999, American journal of human genetics.

[8]  E. Boerwinkle,et al.  Sequence diversity and large-scale typing of SNPs in the human apolipoprotein E gene. , 2000, Genome research.

[9]  P. Oefner,et al.  High-accuracy DNA sequence variation screening by DHPLC. , 2000, BioTechniques.

[10]  K K Kidd,et al.  The accuracy of statistical methods for estimation of haplotype frequencies: an example from the CD4 locus. , 2000, American journal of human genetics.

[11]  L. Excoffier,et al.  SIMCOAL: a general coalescent program for the simulation of molecular data in interconnected populations with arbitrary demography. , 2000, The Journal of heredity.

[12]  N. Schork,et al.  Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. , 2000, American journal of human genetics.

[13]  R. Fuerst,et al.  Haplotyping and estimation of haplotype frequencies for closely linked biallelic multilocus genetic phenotypes including nuclear family information , 2001, Human mutation.

[14]  D. Jewell,et al.  NOD2 (CARD15), the first susceptibility gene for Crohn's disease , 2001, Gut.

[15]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[16]  D. Goldstein Islands of linkage disequilibrium , 2001, Nature Genetics.

[17]  M. Boehnke,et al.  Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies , 2001, Nature Genetics.

[18]  Andrew G. Clark,et al.  Haplotype Diversity and Linkage Disequilibrium at Human G6PD: Recent Origin of Alleles That Confer Malarial Resistance , 2001, Science.

[19]  C. Fischer Handbook of statistical genetics: , 2002, Human Genetics.

[20]  D C Thomas,et al.  Genome Scan of Complex Traits by Haplotype Sharing Correlation , 2001, Genetic epidemiology.

[21]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[22]  J. Wall,et al.  Why is there so little intragenic linkage disequilibrium in humans? , 2001, Genetical research.

[23]  K K Kidd,et al.  Comparisons of two methods for haplotype reconstruction and haplotype frequency estimation from population data. , 2001, American journal of human genetics.

[24]  Sinead B. O'Leary,et al.  Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease , 2001, Nature Genetics.

[25]  Homogeneous assays for single-nucleotide polymorphism typing using AlphaScreen. , 2001, Genome research.

[26]  Michael P H Stumpf,et al.  Haplotype diversity and the block structure of linkage disequilibrium. , 2002, Trends in genetics : TIG.

[27]  Peter Donnelly,et al.  Assessing population differentiation and isolation from single‐nucleotide polymorphism data , 2002 .

[28]  L. Kruglyak,et al.  Patterns of linkage disequilibrium in the human genome , 2002, Nature Reviews Genetics.

[29]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[30]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[31]  A. Chakravarti,et al.  Haplotype inference in random population samples. , 2002, American journal of human genetics.

[32]  Chiara Sabatti,et al.  Homozygosity and linkage disequilibrium. , 2002, Genetics.

[33]  Momiao Xiong,et al.  Randomly distributed crossovers may generate block-like patterns of linkage disequilibrium: an act of genetic drift , 2003, Human Genetics.

[34]  David B. Goldstein,et al.  Demography, Recombination Hotspot Intensity, and the Block Structure of Linkage Disequilibrium , 2003, Current Biology.

[35]  B. J. Carey,et al.  Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots , 2003, Nature Genetics.