A Bayesian Model for Detecting Past Recombination Events in DNA Multiple Alignments

Most phylogenetic tree estimation methods assume that there is a single set of hierarchical relationships among sequences in a data set for all sites along an alignment. Mosaic sequences produced by past recombination events will violate this assumption and may lead to misleading results from a phylogenetic analysis due to the imposition of a single tree along the entire alignment. Therefore, the detection of past recombination is an important first step in an analysis. A Bayesian model for the changes in topology caused by recombination events is described here. This model relaxes the assumption of one topology for all sites in an alignment and uses the theory of Hidden Markov models to facilitate calculations, the hidden states being the underlying topologies at each site in the data set. Changes in topology along the multiple sequence alignment are estimated by means of the maximum a posteriori (MAP) estimate. The performance of the MAP estimate is assessed by application of the model to data sets of four sequences, both simulated and real.

[1]  D. Burke,et al.  Identification of breakpoints in intergenotypic recombinants of HIV type 1 by bootscanning. , 1995, AIDS research and human retroviruses.

[2]  J. Zhou,et al.  Sequence diversity within the argF, fbp and recA genes of natural isolates of Neisseria meningitidis: interspecies recombination within the argF gene , 1992, Molecular microbiology.

[3]  S. Sawyer Statistical tests for detecting gene conversion. , 1989, Molecular biology and evolution.

[4]  David C. Jones,et al.  Combining protein evolution and secondary structure. , 1996, Molecular biology and evolution.

[5]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[6]  Dirk Husmeier,et al.  Detection of Recombination in DNA Multiple Alignments with Hidden Markov Models , 2002, J. Comput. Biol..

[7]  J. Stephens,et al.  Statistical methods of DNA sequence analysis: detection of intragenic recombination or gene conversion. , 1985, Molecular biology and evolution.

[8]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[9]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[10]  J. M. Smith,et al.  Detecting recombination from gene trees. , 1998, Molecular biology and evolution.

[11]  G. McGuire,et al.  A graphical method for detecting recombination in phylogenetic data sets. , 1997, Molecular biology and evolution.

[12]  A. Dress,et al.  Split decomposition: a new and useful approach to phylogenetic analysis of distance data. , 1992, Molecular phylogenetics and evolution.

[13]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[14]  E. Holmes,et al.  A likelihood method for the detection of selection and recombination using nucleotide sequences. , 1997, Molecular biology and evolution.

[15]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[16]  Lain L. MacDonald,et al.  Hidden Markov and Other Models for Discrete- valued Time Series , 1997 .

[17]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[18]  D. Hartl,et al.  Inference of horizontal genetic transfer from molecular data: an approach using the bootstrap. , 1992, Genetics.