Tree and rate estimation by local evaluation of heterochronous nucleotide data

Abstract Motivation: Heterochronous gene sequence data is important for characterizing the evolutionary processes of fast-evolving organisms such as RNA viruses. A limited set of algorithms exists for estimating the rate of nucleotide substitution and inferring phylogenetic trees from such data. The authors here present a new method, Tree and Rate Estimation by Local Evaluation (TREBLE) that robustly calculates the rate of nucleotide substitution and phylogeny with several orders of magnitude improvement in computational time. Methods: For the basis of its rate estimation TREBLE novelly utilizes a geometric interpretation of the molecular clock assumption to deduce a local estimate of the rate of nucleotide substitution for triplets of dated sequences. Averaging the triplet estimates via a variance weighting yields a global estimate of the rate. From this value, an iterative refinement procedure relying on statistical properties of the triplets then generates a final estimate of the global rate of nucleotide substitution. The estimated global rate is then utilized to find the tree from the pairwise distance matrix via an UPGMA-like algorithm. Results: Simulation studies show that TREBLE estimates the rate of nucleotide substitution with point estimates comparable with the best of available methods. Confidence intervals are comparable with that of BEAST. TREBLE's phylogenetic reconstruction is significantly improved over the other distance matrix method but not as accurate as the Bayesian algorithm. Compared with three other algorithms, TREBLE reduces computational time by a minimum factor of 3000. Relative to the algorithm with the most accurate estimates for the rate of nucleotide substitution (i.e. BEAST), TREBLE is over 10 000 times more computationally efficient. Availability: Contact: jdobrien@ucla.edu

[1]  G L Ada,et al.  Options for the control of influenza III. Cairns, North Queensland, Australia (4-9 May 1996). , 1997, Vaccine.

[2]  E. Brown,et al.  Influenza virus genetics. , 2000, Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie.

[3]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[4]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[5]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[6]  Michael J. Sanderson,et al.  R8s: Inferring Absolute Rates of Molecular Evolution, Divergence times in the Absence of a Molecular Clock , 2003, Bioinform..

[7]  R. Lanciotti,et al.  Molecular evolution and phylogeny of dengue-4 viruses. , 1997, The Journal of general virology.

[8]  M. Suchard,et al.  Hierarchical phylogenetic models for analyzing multipartite sequence data. , 2003, Systematic biology.

[9]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[10]  J. Fox Applied Regression Analysis, Linear Models, and Related Methods , 1997 .

[11]  Dongbo Bu,et al.  BMC Infectious Diseases BioMed Central Research article Date of origin of the SARS coronavirus strains , 2004 .

[12]  U. Ligges Review of An R and S-PLUS companion to applied regression by J. Fox, Sage Publications, Thousand Oaks, California 2002 , 2003 .

[13]  A. Rodrigo,et al.  Reconstructing genealogies of serial samples under the assumption of a molecular clock using serial-sample UPGMA. , 2000, Molecular biology and evolution.

[14]  N. Ferguson,et al.  Ecological and immunological determinants of influenza evolution , 2003, Nature.

[15]  A. Oskooi Molecular Evolution and Phylogenetics , 2008 .

[16]  A Rzhetsky,et al.  Tests of applicability of several substitution models for DNA sequence data. , 1995, Molecular biology and evolution.

[17]  Andrew Rambaut,et al.  Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies , 2000, Bioinform..

[18]  Andrew Rambaut,et al.  Inferring the rate and time-scale of dengue virus evolution. , 2003, Molecular biology and evolution.

[19]  J Goodman,et al.  The value of a database in surveillance and vaccine selection , 2001 .

[20]  Sung Keun Kang,et al.  Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China. , 2004, Science.

[21]  Guoping Zhao,et al.  Molecular Evolution of the SARS Coronavirus During the Course of the SARS Epidemic in China , 2004, Science.

[22]  Alexei J Drummond,et al.  Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. , 2002, Genetics.

[23]  A. Rodrigo,et al.  Measurably evolving populations , 2003 .

[24]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[25]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[26]  P. Hall Rate of convergence in bootstrap approximations , 1988 .

[27]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[28]  Xun Gu,et al.  Statistical models for studying DNA sequence evolution , 1995 .

[29]  S. Jeffery Evolution of Protein Molecules , 1979 .