Little loss of information due to unknown phase for fine-scale linkage-disequilibrium mapping with single-nucleotide-polymorphism genotype data.

We present the results of a simulation study that indicate that true haplotypes at multiple, tightly linked loci often provide little extra information for linkage-disequilibrium fine mapping, compared with the information provided by corresponding genotypes, provided that an appropriate statistical analysis method is used. In contrast, a two-stage approach to analyzing genotype data, in which haplotypes are inferred and then analyzed as if they were true haplotypes, can lead to a substantial loss of information. The study uses our COLDMAP software for fine mapping, which implements a Markov chain-Monte Carlo algorithm that is based on the shattered coalescent model of genetic heterogeneity at a disease locus. We applied COLDMAP to 100 replicate data sets simulated under each of 18 disease models. Each data set consists of haplotype pairs (diplotypes) for 20 SNPs typed at equal 50-kb intervals in a 950-kb candidate region that includes a single disease locus located at random. The data sets were analyzed in three formats: (1). as true haplotypes; (2). as haplotypes inferred from genotypes using an expectation-maximization algorithm; and (3). as unphased genotypes. On average, true haplotypes gave a 6% gain in efficiency compared with the unphased genotypes, whereas inferring haplotypes from genotypes led to a 20% loss of efficiency, where efficiency is defined in terms of root mean integrated square error of the location of the disease locus. Furthermore, treating inferred haplotypes as if they were true haplotypes leads to considerable overconfidence in estimates, with nominal 50% credibility intervals achieving, on average, only 19% coverage. We conclude that (1). given appropriate statistical analyses, the costs of directly measuring haplotypes will rarely be justified by a gain in the efficiency of fine mapping and that (2). a two-stage approach of inferring haplotypes followed by a haplotype-based analysis can be very inefficient for fine mapping, compared with an analysis based directly on the genotypes.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[3]  P. Marjoram,et al.  Ancestral Inference from Samples of DNA Sequences with Recombination , 1996, J. Comput. Biol..

[4]  P. Donnelly,et al.  Progress in population genetics and human evolution , 1997 .

[5]  Dani Gamerman,et al.  Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference , 1997 .

[6]  R. Griffiths,et al.  An ancestral recombination graph , 1997 .

[7]  M. McPeek,et al.  Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. , 1999, American journal of human genetics.

[8]  D. Clayton Linkage Disequilibrium Mapping of Disease Susceptibility Genes in Human Populations , 2000 .

[9]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[10]  M. Boehnke,et al.  Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies , 2001, Nature Genetics.

[11]  C. Sabatti,et al.  Bayesian analysis of haplotypes for linkage disequilibrium mapping. , 2001, Genome research.

[12]  A. Jeffreys,et al.  Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex , 2001, Nature Genetics.

[13]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[14]  Daniel J Schaid,et al.  Relative efficiency of ambiguous vs. directly measured haplotype frequencies , 2002, Genetic epidemiology.

[15]  D J Balding,et al.  Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies. , 2002, American journal of human genetics.

[16]  P. R. Boyd,et al.  Linkage disequilibrium mapping identifies a 390 kb region associated with CYP2D6 poor drug metabolising activity , 2002, The Pharmacogenomics Journal.

[17]  Tianhua Niu,et al.  Haplotype information and linkage disequilibrium mapping for single nucleotide polymorphisms. , 2003, Genome research.

[18]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[19]  David J. Balding,et al.  Multipoint linkage-disequilibrium mapping narrows location interval and identifies mutation heterogeneity , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  B. J. Carey,et al.  Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots , 2003, Nature Genetics.

[21]  D. Balding,et al.  Handbook of statistical genetics , 2004 .