A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data.

We describe a general likelihood-based 'mixture model' for inferring phylogenetic trees from gene-sequence or other character-state data. The model accommodates cases in which different sites in the alignment evolve in qualitatively distinct ways, but does not require prior knowledge of these patterns or partitioning of the data. We call this qualitative variability in the pattern of evolution across sites "pattern-heterogeneity" to distinguish it from both a homogenous process of evolution and from one characterized principally by differences in rates of evolution. We present studies to show that the model correctly retrieves the signals of pattern-heterogeneity from simulated gene-sequence data, and we apply the method to protein-coding genes and to a ribosomal 12S data set. The mixture model outperforms conventional partitioning in both these data sets. We implement the mixture model such that it can simultaneously detect rate- and pattern-heterogeneity. The model simplifies to a homogeneous model or a rate-variability model as special cases, and therefore always performs at least as well as these two approaches, and often considerably improves upon them. We make the model available within a Bayesian Markov-chain Monte Carlo framework for phylogenetic inference, as an easy-to-use computer program.

[1]  D. Hillis,et al.  Ribosomal DNA: Molecular Evolution and Phylogenetic Inference , 1991, The Quarterly Review of Biology.

[2]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[3]  B. Crespi,et al.  Evolution, weighting, and phylogenetic utility of mitochondrial gene sequences and a compilation of conserved polymerase chain reaction primers , 1994 .

[4]  A. von Haeseler,et al.  A stochastic model for the evolution of autocorrelated DNA sequences. , 1994, Molecular phylogenetics and evolution.

[5]  M. Pagel Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[6]  Walter R. Gilks,et al.  Hypothesis testing and model selection , 1995 .

[7]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[8]  David J. Spiegelhalter,et al.  Introducing Markov chain Monte Carlo , 1995 .

[9]  D. Penny,et al.  Conserved sequence motifs, alignment, and secondary structure for the third domain of animal 12S rRNA. , 1996, Molecular biology and evolution.

[10]  Adrian E. Raftery,et al.  Hypothesis testing and model selection , 1996 .

[11]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[12]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[13]  D. Balding,et al.  Genealogical inference from microsatellite data. , 1998, Genetics.

[14]  R A Goldstein,et al.  Models of natural mutations including site heterogeneity , 1998, Proteins.

[15]  C. Krajewski,et al.  Dynamically heterogenous partitions and phylogenetic inference: an evaluation of analytical strategies with cytochrome b and ND6 gene sequences in cranes. , 1999, Molecular phylogenetics and evolution.

[16]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[17]  M. Pagel Inferring the historical patterns of biological evolution , 1999, Nature.

[18]  John P. Huelsenbeck,et al.  Variation in the Pattern of Nucleotide Substitution Across Sites , 1999, Journal of Molecular Evolution.

[19]  J. C. Regier,et al.  More taxa or more characters revisited: combining data from nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea (Insecta: Lepidoptera). , 2000, Systematic biology.

[20]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[21]  K. Crandall,et al.  Selecting the best-fit model of nucleotide substitution. , 2001, Systematic biology.

[22]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[23]  B. Drossel Biological evolution and statistical physics , 2001, cond-mat/0101409.

[24]  D. Hoyle,et al.  RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. , 2001, Genetics.

[25]  W. Murphy,et al.  Resolution of the Early Placental Mammal Radiation Using Bayesian Phylogenetics , 2001, Science.

[26]  P. Lewis A likelihood approach to estimating phylogeny from discrete morphological character data. , 2001, Systematic biology.

[27]  Mark Pagel,et al.  Major fungal lineages are derived from lichen symbiotic ancestors , 2022 .

[28]  M. Rattray,et al.  Bayesian phylogenetics using an RNA substitution model applied to early mammalian evolution. , 2002, Molecular biology and evolution.

[29]  M. Pagel,et al.  Accounting for phylogenetic uncertainty in comparative studies of evolution and adaptation , 2002 .

[30]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[31]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[32]  P. Higgs Compensatory neutral mutations and the evolution of RNA , 2004, Genetica.

[33]  H. Philippe,et al.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. , 2004, Molecular biology and evolution.

[34]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[35]  M. Springer,et al.  Secondary structure and patterns of evolution among mammalian mitochondrial 12S rRNA molecules , 1996, Journal of Molecular Evolution.