A Mixture Model and a Hidden Markov Model to Simultaneously Detect Recombination Breakpoints and Reconstruct Phylogenies

Homologous recombination is a pervasive biological process that affects sequences in all living organisms and viruses. In the presence of recombination, the evolutionary history of an alignment of homologous sequences cannot be properly depicted by a single bifurcating tree: some sites have evolved along a specific phylogenetic tree, others have followed another path. Methods available to analyse recombination in sequences usually involve an analysis of the alignment through sliding-windows, or are particularly demanding in computational resources, and are often limited to nucleotide sequences. In this article, we propose and implement a Mixture Model on trees and a phylogenetic Hidden Markov Model to reveal recombination breakpoints while searching for the various evolutionary histories that are present in an alignment known to have undergone homologous recombination. These models are sufficiently efficient to be applied to dozens of sequences on a single desktop computer, and can handle equivalently nucleotide or protein sequences. We estimate their accuracy on simulated sequences and test them on real data.

[1]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[2]  D. Posada Evaluation of methods for detecting recombination from DNA sequences: empirical data. , 2002, Molecular biology and evolution.

[3]  G. Serio,et al.  A new method for calculating evolutionary substitution rates , 2005, Journal of Molecular Evolution.

[4]  Vladimir N. Minin,et al.  Dual multiple change-point model leads to more accurate recombination detection , 2005, Bioinform..

[5]  Gráinne McGuire,et al.  A Bayesian Model for Detecting Past Recombination Events in DNA Multiple Alignments , 2000, J. Comput. Biol..

[6]  Hidetoshi Shimodaira An approximately unbiased test of phylogenetic tree selection. , 2002, Systematic biology.

[7]  Dirk Husmeier,et al.  Detecting interspecific recombination with a pruned probabilistic divergence measure , 2005, Bioinform..

[8]  D. Husmeier,et al.  A Heuristic Bayesian Method for Segmenting DNA Sequence Alignments and Detecting Evidence for Recombination and Gene Conversion , 2006, Statistical applications in genetics and molecular biology.

[9]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[10]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[11]  M. Suchard,et al.  Oh brother, where art thou? A Bayes factor test for recombination with uncertain heritage. , 2002, Systematic biology.

[12]  Laurent Gueguen,et al.  Sarment: Python modules for HMM analysis and partitioning of sequences , 2005, Bioinform..

[13]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[14]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[15]  M. Suchard,et al.  Phylogenetic Mapping of Recombination Hotspots in Human Immunodeficiency Virus via Spatially Smoothed Change-Point Processes , 2007, Genetics.

[16]  H. Kishino,et al.  Phylogenetic Detection of Recombination with a Bayesian Prior on the Distance between Trees , 2008, PLoS ONE.

[17]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[18]  Mark A. Ragan,et al.  Detecting recombination in evolving nucleotide sequences , 2006, BMC Bioinformatics.

[19]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[20]  David Posada,et al.  Automated phylogenetic detection of recombination using a genetic algorithm. , 2006, Molecular biology and evolution.

[21]  D. Bryant,et al.  A Simple and Robust Statistical Test for Detecting the Presence of Recombination , 2006, Genetics.

[22]  Sergei L. Kosakovsky Pond,et al.  GARD: a genetic algorithm for recombination detection , 2006, Bioinform..

[23]  Laurent Gueguen,et al.  Segmentation by Maximal Predictive Partitioning According to Composition Biases , 2000, JOBIM.

[24]  Dirk Husmeier,et al.  Discriminating between rate heterogeneity and interspecific recombination in DNA sequence alignments with phylogenetic factorial hidden Markov models , 2005, ECCB/JBI.