Phylogenetic models of rate heterogeneity: a high performance computing perspective

Inference of phylogenetic trees using the maximum likelihood (ML) method is NP-hard. Furthermore, the computation of the likelihood function for huge trees of more than 1,000 organisms is computationally intensive due to a large amount of floating point operations and high memory consumption. Within this context, the present paper compares two competing mathematical models that account for evolutionary rate heterogeneity: the Gamma and CAT models. The intention of this paper is to show that - from a purely empirical point of view - CAT can be used instead of Gamma. The main advantage of CAT over Gamma consists in significantly lower memory consumption and faster inference times. An experimental study using RAxML has been performed on 19 real-world datasets comprising 73 up to 1,663 DNA sequences. Results show that CAT is on average 5.5 times faster than Gamma and - surprisingly enough - also yields trees with slightly superior Gamma likelihood values. The usage of the CAT model decreases the amount of average L2 and L3 cache misses by factor 8.55

[1]  James F. Smith Phylogenetics of seed plants : An analysis of nucleotide sequences from the plastid gene rbcL , 1993 .

[2]  Arndt von Haeseler,et al.  PhyNav: A Novel Approach to Reconstruct Large Phylogenies , 2004, GfKl.

[3]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[4]  A. von Haeseler,et al.  Identifying site-specific substitution rates. , 2003, Molecular biology and evolution.

[5]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[6]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[7]  Arndt von Haeseler,et al.  pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies , 2005, Bioinform..

[8]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[9]  A. Lemmon,et al.  The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Thomas Ludwig,et al.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees , 2005, Bioinform..

[11]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[12]  David A. Bader,et al.  Industrial applications of high-performance computing for phylogeny reconstruction , 2001, SPIE ITCom.

[13]  K. Schleifer,et al.  ARB: a software environment for sequence data. , 2004, Nucleic acids research.

[14]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[15]  Matthew J. Brauer,et al.  Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference. , 2002, Molecular biology and evolution.

[16]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[17]  K. Strimmer,et al.  TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics , 2004, BMC Evolutionary Biology.

[18]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[19]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[20]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[21]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[22]  Ming-Yang Kao,et al.  Phylogeny Reconstruction , 2008, Encyclopedia of Algorithms.

[23]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[24]  Tamir Tuller,et al.  Maximum Likelihood of Evolutionary Trees Is Hard , 2005, RECOMB.

[25]  P. Hugenholtz,et al.  A multiple-outgroup approach to resolving division-level phylogenetic relationships using 16S rDNA data. , 2001, International journal of systematic and evolutionary microbiology.