Dual multiple change-point model leads to more accurate recombination detection

MOTIVATION We introduce a dual multiple change-point (MCP) model for recombination detection among aligned nucleotide sequences. The dual MCP model is an extension of the model introduced previously by Suchard and co-workers. In the original single MCP model, one change-point process is used to model spatial phylogenetic variation. Here, we show that using two change-point processes, one for spatial variation of tree topologies and the other for spatial variation of substitution process parameters, increases recombination detection accuracy. Statistical analysis is done in a Bayesian framework using reversible jump Markov chain Monte Carlo sampling to approximate the joint posterior distribution of all model parameters. RESULTS We use primate mitochondrial DNA data with simulated recombination break-points at specific locations to compare the two models. We also analyze two real HIV sequences to identify recombination break-points using the dual MCP model.

[1]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[2]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[3]  B. Carlin,et al.  Bayesian Model Choice Via Markov Chain Monte Carlo Methods , 1995 .

[4]  G. McGuire,et al.  A graphical method for detecting recombination in phylogenetic data sets. , 1997, Molecular biology and evolution.

[5]  R. Kass,et al.  Bayesian curve-fitting with free-knot splines , 2001 .

[6]  Dirk Husmeier,et al.  Detecting recombination with MCMC , 2002, ISMB.

[7]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[8]  Martin A. Nowak,et al.  Antibody neutralization and escape by HIV-1 , 2003, Nature.

[9]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[10]  J. Goudsmit,et al.  Human immunodeficiency virus type 1 subtypes defined by env show high frequency of recombinant gag genes. The UNAIDS Network for HIV Isolation and Characterization , 1996, Journal of virology.

[11]  Melissa M. Kelley,et al.  Molecular Evolution and Mosaicism of Leptospiral Outer Membrane Proteins Involves Horizontal DNA Transfer , 2004, Journal of bacteriology.

[12]  Dirk Husmeier,et al.  Probabilistic divergence measures for detecting interspecies recombination , 2001, ISMB.

[13]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[14]  J. Hein Reconstructing evolution of sequences subject to recombination using parsimony. , 1990, Mathematical biosciences.

[15]  M. Suchard,et al.  Analysis of the evolutionary relationships of HIV-1 and SIVcpz sequences using bayesian inference: implications for the origin of HIV-1. , 2003, Molecular biology and evolution.

[16]  B. Korber,et al.  Evolutionary and immunological implications of contemporary HIV-1 variation. , 2001, British medical bulletin.

[17]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[18]  P. Sharp,et al.  Recombination in HIV-1 , 1995, Nature.

[19]  David L. Robertson,et al.  Recombination in AIDS viruses , 1995, Journal of Molecular Evolution.

[20]  Marc A Suchard,et al.  Are you my mother? Bayesian phylogenetic inference of recombination among putative parental strains. , 2003, Applied bioinformatics.

[21]  Stephen P. Brooks,et al.  Convergence Assessment for Reversible Jump MCMC Simulations , 2007 .

[22]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[23]  Joseph Felsenstein,et al.  The number of evolutionary trees , 1978 .

[24]  S. Marca,et al.  Inferring Spatial Phylogenetic Variation Along Nucleotide Sequences : A Multiple Changepoint Model , 2003 .

[25]  Peter Green,et al.  Highly Structured Stochastic Systems , 2003 .

[26]  M. Suchard,et al.  Inferring Spatial Phylogenetic Variation Along Nucleotide Sequences , 2003 .

[27]  M. Suchard,et al.  Hierarchical phylogenetic models for analyzing multipartite sequence data. , 2003, Systematic biology.

[28]  M. Suchard,et al.  Oh brother, where art thou? A Bayes factor test for recombination with uncertain heritage. , 2002, Systematic biology.

[29]  David L. Robertson,et al.  HIV-1 nomenclature proposal: a reference guide to HIV-1 classification. , 2000 .

[30]  R E Weiss,et al.  On Bayesian calculations for mixture likelihoods and priors. , 1999, Statistics in medicine.

[31]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[32]  E. Holmes,et al.  A likelihood method for the detection of selection and recombination using nucleotide sequences. , 1997, Molecular biology and evolution.

[33]  M. Salminen,et al.  HIV‐1 genetic subtype A/B recombinant strain causing an explosive epidemic in injecting drug users in Kaliningrad , 1998, AIDS.

[34]  P. Sharp,et al.  Rates and dates of divergence between AIDS virus nucleotide sequences. , 1988, Molecular biology and evolution.

[35]  E. Holmes,et al.  Evolutionary aspects of recombination in RNA viruses. , 1999, The Journal of general virology.

[36]  Hani Doss,et al.  Phylogenetic Tree Construction Using Markov Chain Monte Carlo , 2000 .

[37]  M. Suchard,et al.  Models for Estimating Bayes Factors with Applications to Phylogeny and Tests of Monophyly , 2005, Biometrics.

[38]  D. Husmeier,et al.  Detecting recombination in 4-taxa DNA sequence alignments with Bayesian hidden Markov models and Markov chain Monte Carlo. , 2003, Molecular biology and evolution.

[39]  T Gojobori,et al.  Molecular phylogeny and evolution of primate mitochondrial DNA. , 1988, Molecular biology and evolution.

[40]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[41]  Karin S. Dorman,et al.  Bootstrap Confidence Levels for HIV-1 Recombination , 2002, Journal of Molecular Evolution.

[42]  Christopher J. Lee,et al.  Positive Selection Detection in 40,000 HumanImmunodeficiency Virus (HIV) Type 1 Sequences Automatically IdentifiesDrug Resistance and Positive Fitness Mutations in HIV Proteaseand ReverseTranscriptase , 2004, Journal of Virology.

[43]  P. Gustafson,et al.  Conservative prior distributions for variance parameters in hierarchical models , 2006 .

[44]  M. Suchard,et al.  Bayesian selection of continuous-time Markov chain evolutionary models. , 2001, Molecular biology and evolution.

[45]  B. Larder,et al.  Retroviral recombination can lead to linkage of reverse transcriptase mutations that confer increased zidovudine resistance , 1995, Journal of virology.

[46]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.