Joint estimation of gene conversion rates and mean conversion tract lengths from population SNP data

Motivation: Two known types of meiotic recombination are crossovers and gene conversions. Although they leave behind different footprints in the genome, it is a challenging task to tease apart their relative contributions to the observed genetic variation. In particular, for a given population SNP dataset, the joint estimation of the crossover rate, the gene conversion rate and the mean conversion tract length is widely viewed as a very difficult problem. Results: In this article, we devise a likelihood-based method using an interleaved hidden Markov model (HMM) that can jointly estimate the aforementioned three parameters fundamental to recombination. Our method significantly improves upon a recently proposed method based on a factorial HMM. We show that modeling overlapping gene conversions is crucial for improving the joint estimation of the gene conversion rate and the mean conversion tract length. We test the performance of our method on simulated data. We then apply our method to analyze real biological data from the telomere of the X chromosome of Drosophila melanogaster, and show that the ratio of the gene conversion rate to the crossover rate for the region may not be nearly as high as previously claimed. Availability: A software implementation of the algorithms discussed in this article is available at http://www.cs.berkeley.edu/∼yss/software.html. Contact: yss@eecs.berkeley.edu

[1]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[2]  R. Hudson Two-locus sampling distributions and their application. , 2001, Genetics.

[3]  J. Hein,et al.  The coalescent with gene conversion. , 2000, Genetics.

[4]  Jeffrey D. Wall,et al.  Estimating Recombination Rates Using Three-Site Likelihoods , 2004, Genetics.

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[7]  Molly Przeworski,et al.  Insights into recombination from patterns of linkage disequilibrium in humans. , 2002, Genetics.

[8]  Daniel Falush,et al.  Inferring Human Colonization History Using a Copying Model , 2008, PLoS genetics.

[9]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[10]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[11]  G Harauz,et al.  Meiotic gene conversion tract length distribution within the rosy locus of Drosophila melanogaster. , 1994, Genetics.

[12]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[13]  G. McVean,et al.  Estimating Meiotic Gene Conversion Rates From Population Genetic Data , 2007, Genetics.

[14]  Paul Marjoram,et al.  Estimating Recombination Rates From Single-Nucleotide Polymorphisms Using Summary Statistics , 2006, Genetics.

[15]  Dana C Crawford,et al.  Evidence for substantial fine-scale variation in recombination rates across the human genome , 2004, Nature Genetics.

[16]  Yun S. Song,et al.  Algorithms to Distinguish the Role of Gene-Conversion from Single-Crossover Recombination in the Derivation of SNP Sequences in Populations , 2006, RECOMB.

[17]  P. Green,et al.  Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[19]  J. Wall,et al.  Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels. , 2001, American journal of human genetics.

[20]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[21]  J. Weitzman,et al.  Linkage disequilibrium , 2001, Genome Biology.

[22]  J. Wall,et al.  Close look at gene conversion hot spots , 2004, Nature Genetics.

[23]  C. J-F,et al.  THE COALESCENT , 1980 .

[24]  L. Steinmetz,et al.  High-resolution mapping of meiotic crossovers and non-crossovers in yeast , 2008, Nature.

[25]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[26]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[27]  E A Thompson,et al.  Linkage disequilibrium mapping: the role of population history, size, and structure. , 2001, Advances in genetics.

[28]  J. Braverman,et al.  Linkage disequilibria and the site frequency spectra in the su(s) and su(w(a)) regions of the Drosophila melanogaster X chromosome. , 2000, Genetics.

[29]  D. Bentley,et al.  Whole-genome re-sequencing. , 2006, Current opinion in genetics & development.

[30]  A. Jeffreys,et al.  Intense and highly localized gene conversion activity in human meiotic crossover hot spots , 2004, Nature Genetics.

[31]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[32]  Molly Przeworski,et al.  Insights Into Recombination From Patterns of Linkage Disequilibrium in Humans , 2004, Genetics.

[33]  M. Ibrahim Whole-Genome Resequencing , 2009 .