Distinguishing Regional from Within-Codon Rate Heterogeneity in DNA Sequence Alignments

We present an improved phylogenetic factorial hidden Markov model (FHMM) for detecting two types of mosaic structures in DNA sequence alignments, related to (1) recombination and (2) rate heterogeneity. The focus of the present work is on improving the modelling of the latter aspect. Earlier papers have modelled different degrees of rate heterogeneity with separate hidden states of the FHMM. This approach fails to appreciate the intrinsic difference between two types of rate heterogeneity: long-range regional effects, which are potentially related to differences in the selective pressure, and the short-term periodic patterns within the codons, which merely capture the signature of the genetic code. We propose an improved model that explicitly distinguishes between these two effects, and we assess its performance on a set of simulated DNA sequence alignments.

[1]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[2]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[3]  Darren J. Wilkinson,et al.  Detecting homogeneous segments in DNA sequences by using hidden Markov models , 2000 .

[4]  Wolfgang P. Lehrach,et al.  Segmenting bacterial and viral DNA sequence alignments with a trans‐dimensional phylogenetic factorial hidden Markov model , 2009 .

[5]  Alexander V. Mantzaris,et al.  Statistical Applications in Genetics and Molecular Biology Addressing the Shortcomings of Three Recent Bayesian Methods for Detecting Interspecific Recombination in DNA Sequence Alignments , 2011 .

[6]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[7]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[8]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[9]  Dirk Husmeier,et al.  Discriminating between rate heterogeneity and interspecific recombination in DNA sequence alignments with phylogenetic factorial hidden Markov models , 2005, ECCB/JBI.

[10]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[11]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[12]  Mike Steel,et al.  Links between maximum likelihood and maximum parsimony under a simple model of site substitution , 1997 .

[13]  John M. Hancock,et al.  Phylogenetic inference under recombination using Bayesian stochastic topology selection , 2008, Bioinform..

[14]  D. Husmeier,et al.  Detecting recombination in 4-taxa DNA sequence alignments with Bayesian hidden Markov models and Markov chain Monte Carlo. , 2003, Molecular biology and evolution.

[15]  M. Suchard,et al.  Inferring Spatial Phylogenetic Variation Along Nucleotide Sequences , 2003 .

[16]  Vladimir N. Minin,et al.  Dual multiple change-point model leads to more accurate recombination detection , 2005, Bioinform..