A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution.

Variations of nucleotidic composition affect phylogenetic inference conducted under stationary models of evolution. In particular, they may cause unrelated taxa sharing similar base composition to be grouped together in the resulting phylogeny. To address this problem, we developed a nonstationary and nonhomogeneous model accounting for compositional biases. Unlike previous nonstationary models, which are branchwise, that is, assume that base composition only changes at the nodes of the tree, in our model, the process of compositional drift is totally uncoupled from the speciation events. In addition, the total number of events of compositional drift distributed across the tree is directly inferred from the data. We implemented the method in a Bayesian framework, relying on Markov Chain Monte Carlo algorithms, and applied it to several nucleotidic data sets. In most cases, the stationarity assumption was rejected in favor of our nonstationary model. In addition, we show that our method is able to resolve a well-known artifact. By Bayes factor evaluation, we compared our model with 2 previously developed nonstationary models. We show that the coupling between speciations and compositional shifts inherent to branchwise models may lead to an overparameterization, resulting in a lesser fit. In some cases, this leads to incorrect conclusions, concerning the nature of the compositional biases. In contrast, our compound model more flexibly adapts its effective number of parameters to the data sets under investigation. Altogether, our results show that accounting for nonstationary sequence evolution may require more elaborate and more flexible models than those currently used.

[1]  G. Bernardi,et al.  The vertebrate genome: isochores and evolution. , 1993, Molecular biology and evolution.

[2]  M. Gouy,et al.  Inferring phylogenies from DNA sequences of unequal base compositions. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[3]  A. Gelfand [Practical Markov Chain Monte Carlo]: Comment , 1992 .

[4]  L. Jermiin,et al.  Nucleotide Composition Bias Affects Amino Acid Content in Proteins Coded by Animal Mitochondria , 1997, Journal of Molecular Evolution.

[5]  Masami Hasegawa,et al.  Ribosomal RNA trees misleading? , 1993, Nature.

[6]  Adrian E. Raftery,et al.  [Practical Markov Chain Monte Carlo]: Comment: One Long Run with Diagnostics: Implementation Strategies for Markov Chain Monte Carlo , 1992 .

[7]  Fred R. McMorris,et al.  Consensusn-trees , 1981 .

[8]  P. Lockhart,et al.  Substitutional bias confounds inference of cyanelle origins from sequence data , 1992, Journal of Molecular Evolution.

[9]  Ross A. Overbeek,et al.  The Ribosomal Database Project (RDP) , 1996, Nucleic Acids Res..

[10]  Joshua T Herbeck,et al.  Nonhomogeneous model of sequence evolution indicates independent origins of primary endosymbionts within the enterobacteriales (gamma-Proteobacteria). , 2005, Molecular biology and evolution.

[11]  J. Lake,et al.  Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[12]  R. H. Thomas,et al.  Reduced thermophilic bias in the 16S rDNA sequence from Thermus ruber provides further support for a relationship between Thermus and Deinococcus , 1993 .

[13]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[14]  Allan C. Wilson,et al.  Mitochondrial DNA sequences of primates: Tempo and mode of evolution , 2005, Journal of Molecular Evolution.

[15]  M. Gouy,et al.  A nonhyperthermophilic common ancestor to extant life forms. , 1999, Science.

[16]  S. Andersson,et al.  A phylogenomic study of endosymbiotic bacteria. , 2004, Molecular biology and evolution.

[17]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[18]  P. Green,et al.  Trans-dimensional Markov chain Monte Carlo , 2000 .

[19]  James R. Cole,et al.  The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy , 2003, Nucleic Acids Res..

[20]  H. Philippe,et al.  Archaea sister group of Bacteria? Indications from tree reconstruction artifacts in ancient phylogenies. , 1999, Molecular biology and evolution.

[21]  E. Holmes,et al.  The evolution of base composition and phylogenetic inference. , 2000, Trends in ecology & evolution.

[22]  F. Ayala,et al.  Shared nucleotide composition biases among species and their impact on phylogenetic reconstructions of the Drosophilidae. , 2001, Molecular biology and evolution.

[23]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[24]  Radhey S. Gupta Protein Phylogenies and Signature Sequences: A Reappraisal of Evolutionary Relationships among Archaebacteria, Eubacteria, and Eukaryotes , 1998, Microbiology and Molecular Biology Reviews.

[25]  D. Penny,et al.  Comment on "Hexapod Origins: Monophyletic or Paraphyletic?" , 2003, Science.

[26]  C R Woese,et al.  Archaeal phylogeny: reexamination of the phylogenetic position of Archaeoglobus fulgidus in light of certain composition-induced artifacts. , 1991, Systematic and applied microbiology.

[27]  P. Higgs RNA secondary structure: physical and computational aspects , 2000, Quarterly Reviews of Biophysics.

[28]  M. Steel,et al.  Recovering evolutionary trees under a more realistic model of sequence evolution. , 1994, Molecular biology and evolution.

[29]  Peter G Foster,et al.  Modeling compositional heterogeneity. , 2004, Systematic biology.

[30]  Jonathan A. Eisen,et al.  The RecA protein as a model molecule for molecular systematic studies of bacteria: Comparison of trees of RecAs and 16S rRNAs from the same species , 1995, Journal of Molecular Evolution.

[31]  M. Gouy,et al.  Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. , 1998, Molecular biology and evolution.

[32]  Peter Green,et al.  Highly Structured Stochastic Systems , 2003 .

[33]  Z. Yang,et al.  On the use of nucleic acid sequences to infer early branchings in the tree of life. , 1995, Molecular biology and evolution.

[34]  T. Jukes,et al.  Silent nucleotide substitutions and G+C content of some mitochondrial and bacterial genes , 2005, Journal of Molecular Evolution.

[35]  J. Boore,et al.  Hexapod Origins: Monophyletic or Paraphyletic? , 2003, Science.

[36]  H. Philippe,et al.  The new phylogeny of eukaryotes. , 2000, Current opinion in genetics & development.

[37]  Sudhir Kumar,et al.  Evolutionary distance estimation under heterogeneous substitution pattern among lineages. , 2002, Molecular biology and evolution.

[38]  J. Huelsenbeck,et al.  A compound poisson process for relaxing the molecular clock. , 2000, Genetics.

[39]  Y. Ogata A Monte Carlo method for high dimensional integration , 1989 .

[40]  M. Holder,et al.  Hastings ratio of the LOCAL proposal used in Bayesian phylogenetics. , 2005, Systematic biology.

[41]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[42]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[43]  H. Philippe,et al.  Computing Bayes factors using thermodynamic integration. , 2006, Systematic biology.

[44]  M. Stanhope,et al.  Molecular phylogeny of living xenarthrans and the impact of character and taxon sampling on the placental tree rooting. , 2002, Molecular biology and evolution.

[45]  R. Murray,et al.  The Family Deinococcaceae , 1992 .

[46]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[47]  S. Carroll,et al.  More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. , 2005, Molecular biology and evolution.

[48]  ohn,et al.  Potential Applications and Pitfalls of Bayesian Inference of Phylogeny , 2002 .

[49]  R. Overbeek,et al.  The winds of (evolutionary) change: breathing new life into microbiology. , 1996, Journal of bacteriology.

[50]  D. Penny,et al.  The root of the mammalian tree inferred from whole mitochondrial genomes. , 2003, Molecular phylogenetics and evolution.

[51]  G. Bernardi,et al.  Gene distribution and isochore organization in the nuclear genome of plants. , 1990, Nucleic acids research.

[52]  H. Philippe,et al.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. , 2004, Molecular biology and evolution.

[53]  David J. Balding,et al.  Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities , 2003 .

[54]  Peter G. Foster,et al.  Compositional Bias May Affect Both DNA-Based and Protein-Based Phylogenetic Reconstructions , 1999, Journal of Molecular Evolution.

[55]  F. Delsuc,et al.  Phylogenomics: the beginning of incongruence? , 2006, Trends in genetics : TIG.

[56]  T. Embley,et al.  Trichomonas hydrogenosomes contain the NADH dehydrogenase module of mitochondrial complex I , 2004, Nature.