FastML: a web server for probabilistic reconstruction of ancestral sequences

Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at http://fastml.tau.ac.il/.

[1]  D. Liberles Ancestral sequence reconstruction , 2007 .

[2]  Gaston H. Gonnet,et al.  Empirical codon substitution matrix , 2005, BMC Bioinformatics.

[3]  R. Nielsen,et al.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. , 1998, Genetics.

[4]  M. Donoghue,et al.  Recreating a functional ancestral archosaur visual pigment. , 2002, Molecular biology and evolution.

[5]  Paul D. Williams,et al.  Assessing the Accuracy of Ancestral Protein Reconstruction Methods , 2006, PLoS Comput. Biol..

[6]  Peter F. Stadler,et al.  maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences , 2010, Bioinform..

[7]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[8]  Mark P. Simmons,et al.  Gaps as characters in sequence-based phylogenetic analyses. , 2000, Systematic biology.

[9]  Saharon Rosset,et al.  Accurate estimation of heritability in genome wide studies using random effects models , 2011, Bioinform..

[10]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[11]  P. Waddell,et al.  Plastid Genome Phylogeny and a Model of Amino Acid Substitution for Proteins Encoded by Chloroplast DNA , 2000, Journal of Molecular Evolution.

[12]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[13]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[14]  Richard J. Edwards,et al.  GASP: Gapped Ancestral Sequence Prediction for proteins , 2004, BMC Bioinformatics.

[15]  D. Pollock,et al.  Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference. , 2004, Molecular biology and evolution.

[16]  David Crews,et al.  Resurrecting the Ancestral Steroid Receptor: Ancient Origin of Estrogen Signaling , 2003, Science.

[17]  D. Haussler,et al.  Reconstructing large regions of an ancestral mammalian genome in silico. , 2004, Genome research.

[18]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[19]  M. Hasegawa,et al.  Model of amino acid substitution in proteins encoded by mitochondrial DNA , 1996, Journal of Molecular Evolution.

[20]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[21]  Adi Stern,et al.  A likelihood framework to analyse phyletic patterns , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[22]  Tal Pupko,et al.  A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites: Application to the evolution of five gene families , 2002, Bioinform..

[23]  R. Shamir,et al.  A fast algorithm for joint reconstruction of ancestral amino acid sequences. , 2000, Molecular biology and evolution.

[24]  Megan F. Cole,et al.  Utilizing natural diversity to evolve protein function: applications towards thermostability. , 2011, Current opinion in chemical biology.

[25]  Feng Gao,et al.  Diversity Considerations in HIV-1 Vaccine Selection , 2002, Science.

[26]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[27]  O. Gascuel,et al.  An improved general amino acid replacement matrix. , 2008, Molecular biology and evolution.

[28]  Richard A. Goldstein,et al.  Probabilistic reconstruction of ancestral protein sequences , 1996, Journal of Molecular Evolution.

[29]  Thomas Ludwig,et al.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees , 2005, Bioinform..

[30]  Adi Stern,et al.  Evolutionary Modeling of Rate Shifts Reveals Specificity Determinants in HIV-1 Subtypes , 2008, PLoS Comput. Biol..

[31]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[32]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[33]  Tal Pupko,et al.  Inference of Gain and Loss Events from Phyletic Patterns Using Stochastic Mapping and Maximum Parsimony—A Simulation Study , 2011, Genome biology and evolution.

[34]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[35]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[36]  T. Pupko,et al.  A combined empirical and mechanistic codon model. , 2006, Molecular biology and evolution.

[37]  D. Nickle,et al.  Reconstruction and Function of Ancestral Center-of-Tree Human Immunodeficiency Virus Type 1 Proteins , 2007, Journal of Virology.

[38]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[39]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .