Statistical Phylogenetic Tree Analysis Using Differences of Means

We propose a statistical method to test whether two phylogenetic trees with given alignments are significantly incongruent. Our method compares the two distributions of phylogenetic trees given by two input alignments, instead of comparing point estimations of trees. This statistical approach can be applied to gene tree analysis for example, detecting unusual events in genome evolution such as horizontal gene transfer and reshuffling. Our method uses difference of means to compare two distributions of trees, after mapping trees into a vector space. Bootstrapping alignment columns can then be applied to obtain p-values. To compute distances between means, we employ a “kernel method” which speeds up distance calculations when trees are mapped in a high-dimensional feature space, e.g., splits or quartets feature space. In this pilot study, first we test our statistical method on data sets simulated under a coalescence model, to test whether two alignments are generated by congruent gene trees. We follow our simulation results with applications to data sets of gophers and lice, grasses and their endophytes, and different fungal genes from the same genome. A companion toolkit, Phylotree, is provided to facilitate computational experiments.

[1]  M. Björklund,et al.  The importance of time scale and multiple refugia: incipient speciation and admixture of lineages in the butterfly Erebia triaria (Nymphalidae). , 2005, Molecular phylogenetics and evolution.

[2]  W. Maddison,et al.  Inferring phylogeny despite incomplete lineage sorting. , 2006, Systematic biology.

[3]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[4]  Liang Liu,et al.  Estimating Species Trees Using Multiple-Allele DNA Sequence Data , 2008, Evolution; international journal of organic evolution.

[5]  T. Sang,et al.  Phylogeny of rice genomes with emphasis on origins of allotetraploid species. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  K. O’Donnell,et al.  Phylogenetic relationships among the Harpellales and Kickxellales , 1998 .

[7]  P. H. A. Sneath Mathematics in the Archaeological and Historical Sciences , 1972 .

[8]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[9]  B. Larget,et al.  Bayesian estimation of concordance among gene trees. , 2006, Molecular biology and evolution.

[10]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.

[11]  S. Holmes,et al.  STATISTICAL APPROACH TO TESTS INVOLVING PHYLOGENIES , 2004 .

[12]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[13]  Olivier Gascuel,et al.  Markov Models in Molecular Evolution , 2005 .

[14]  C. Schardl,et al.  Expressed sequence tags and genes associated with loline alkaloid expression by the fungal endophyte Neotyphodium uncinatum. , 2002, Fungal genetics and biology : FG & B.

[15]  K. O’Donnell,et al.  Phylogeny and PCR Identification of Clinically Important Zygomycetes Based on Nuclear Ribosomal-DNA Sequence Data , 1999, Journal of Clinical Microbiology.

[16]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[17]  Mark S. Hafner,et al.  Cospeciation in host-parasite assemblages: Comparative analysis of rates of evolution and timing of , 1990 .

[18]  Wenbin Li,et al.  Bayes estimators for phylogenetic reconstruction , 2009, Systematic biology.

[19]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[20]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[21]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[22]  Fred R. McMorris,et al.  COMPARISON OF UNDIRECTED PHYLOGENETIC TREES BASED ON SUBTREES OF FOUR EVOLUTIONARY UNITS , 1985 .

[23]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[24]  Hidetoshi Shimodaira,et al.  Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference , 1999, Molecular Biology and Evolution.

[25]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[26]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[27]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[28]  M. S. Lee,et al.  Partitioned likelihood support and the evaluation of data set conflict. , 2003, Systematic biology.

[29]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[30]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[31]  S. Speakman,et al.  A novel test for host-symbiont codivergence indicates ancient origin of fungal endophytes in grasses. , 2006, Systematic biology.

[32]  J. Walton,et al.  Horizontal gene transfer and the evolution of secondary metabolite gene clusters in fungi: an hypothesis. , 2000, Fungal genetics and biology : FG & B.

[33]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .

[34]  Elmer S. West From the U. S. A. , 1965 .