Efficient computation of close lower and upper bounds on the minimum number of recombinations in biological sequence evolution

MOTIVATION We are interested in studying the evolution of DNA single nucleotide polymorphism sequences which have undergone (meiotic) recombination. For a given set of sequences, computing the minimum number of recombinations needed to explain the sequences (with one mutation per site) is a standard question of interest, but it has been shown to be NP-hard, and previous algorithms that compute it exactly work either only on very small datasets or on problems with special structure. RESULTS In this paper, we present efficient, practical methods for computing both upper and lower bounds on the minimum number of needed recombinations, and for constructing evolutionary histories that explain the input sequences. We study in detail the efficiency and accuracy of these algorithms on both simulated and real data sets. The algorithms produce very close upper and lower bounds, which match exactly in a surprisingly wide range of data. Thus, with the use of new, very effective lower bounding methods and an efficient algorithm for computing upper bounds, this approach allows the efficient, exact computation of the minimum number of needed recombinations, with high frequency in a large range of data. When upper and lower bounds match, evolutionary histories found by our algorithm correspond to the most parsimonious histories. AVAILABILITY HapBound and SHRUB, programs implementing the new algorithms discussed in this paper, are available at http://wwwcsif.cs.ucdavis.edu/~gusfield/lu.html

[1]  J. Hein A heuristic method to reconstruct the history of sequences subject to recombination , 1993, Journal of Molecular Evolution.

[2]  Peter Donnelly,et al.  Application of Coalescent Methods to Reveal Fine-Scale Rate Variation and Recombination Hotspots , 2004, Genetics.

[3]  Vineet Bafna,et al.  Improved Recombination Lower Bounds for Haplotype Data , 2005, RECOMB.

[4]  E. Boerwinkle,et al.  DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene , 1998, Nature Genetics.

[5]  John D. Kececioglu,et al.  Reconstructing a history of recombinations from a set of sequences , 1994, SODA '94.

[6]  R. Hudson,et al.  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. , 1985, Genetics.

[7]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[8]  Kaizhong Zhang,et al.  Perfect Phylogenetic Networks with Recombination , 2001, J. Comput. Biol..

[9]  M. Kreitman,et al.  Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster , 1983, Nature.

[10]  Yun S. Song,et al.  On the minimum number of recombination events in the evolutionary history of DNA sequences , 2004, Journal of mathematical biology.

[11]  Yun S. Song,et al.  Parsimonious Reconstruction of Sequence Evolution and Haplotype Blocks , 2003, WABI.

[12]  J. Hein Reconstructing evolution of sequences subject to recombination using parsimony. , 1990, Mathematical biosciences.

[13]  Vineet Bafna,et al.  The number of recombination events in a sample history: conflict graph and lower bounds , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Dan Gusfield,et al.  Optimal, Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination , 2004, J. Bioinform. Comput. Biol..

[15]  P. Donnelly,et al.  The Fine-Scale Structure of Recombination Rate Variation in the Human Genome , 2004, Science.

[16]  R. Griffiths,et al.  Bounds on the minimum number of recombination events in a sample history. , 2003, Genetics.

[17]  E. Boerwinkle,et al.  Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. , 1998, American journal of human genetics.

[18]  E. Boerwinkle,et al.  Recombinational and mutational hotspots within the human lipoprotein lipase gene. , 2000, American journal of human genetics.