Detecting recombination with MCMC

MOTIVATION We present a statistical method for detecting recombination, whose objective is to accurately locate the recombinant breakpoints in DNA sequence alignments of small numbers of taxa (4 or 5). Our approach explicitly models the sequence of phylogenetic tree topologies along a multiple sequence alignment. Inference under this model is done in a Bayesian way, using Markov chain Monte Carlo (MCMC). The algorithm returns the site-dependent posterior probability of each tree topology, which is used for detecting recombinant regions and locating their breakpoints. RESULTS The method was tested on a synthetic and three real DNA sequence alignments, where it was found to outperform the established detection methods PLATO, RECPARS, and TOPAL.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[3]  John Maynard Smith,et al.  Analyzing the mosaic structure of genes , 1992, Journal of Molecular Evolution.

[4]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[5]  E. Holmes,et al.  A likelihood method for the detection of selection and recombination using nucleotide sequences. , 1997, Molecular biology and evolution.

[6]  J. Zhou,et al.  Sequence diversity within the argF, fbp and recA genes of natural isolates of Neisseria meningitidis: interspecies recombination within the argF gene , 1992, Molecular microbiology.

[7]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[8]  G. Drouin,et al.  Phylogeny and substitution rates of angiosperm actin genes. , 1996, Molecular biology and evolution.

[9]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[10]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[11]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[12]  G. McGuire,et al.  A graphical method for detecting recombination in phylogenetic data sets. , 1997, Molecular biology and evolution.

[13]  Gráinne McGuire,et al.  A Bayesian Model for Detecting Past Recombination Events in DNA Multiple Alignments , 2000, J. Comput. Biol..

[14]  P. Sharp,et al.  Recombination in HIV-1 , 1995, Nature.

[15]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[16]  Gráinne McGuire,et al.  TOPAL 2.0: improved detection of mosaic sequences within multiple alignments , 2000, Bioinform..

[17]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[18]  J. Hein A heuristic method to reconstruct the history of sequences subject to recombination , 1993, Journal of Molecular Evolution.

[19]  C. Robert,et al.  Bayesian estimation of hidden Markov chains: a stochastic implementation , 1993 .