Optimized ancestral state reconstruction using Sankoff parsimony

BackgroundParsimony methods are widely used in molecular evolution to estimate the most plausible phylogeny for a set of characters. Sankoff parsimony determines the minimum number of changes required in a given phylogeny when a cost is associated to transitions between character states. Although optimizations exist to reduce the computations in the number of taxa, the original algorithm takes time O(n2) in the number of states, making it impractical for large values of n.ResultsIn this study we introduce an optimization of Sankoff parsimony for the reconstruction of ancestral states when ultrametric or additive cost matrices are used. We analyzed its performance for randomly generated matrices, Jukes-Cantor and Kimura's two-parameter models of DNA evolution, and in the reconstruction of elongation factor-1α and ancestral metabolic states of a group of eukaryotes, showing that in all cases the execution time is significantly less than with the original implementation.ConclusionThe algorithms here presented provide a fast computation of Sankoff parsimony for a given phylogeny. Problems where the number of states is large, such as reconstruction of ancestral metabolism, are particularly adequate for this optimization. Since we are reducing the computations required to calculate the parsimony cost of a single tree, our method can be combined with optimizations in the number of taxa that aim at finding the most parsimonious tree.

[1]  Edward Susko,et al.  Covarion shifts cause a long-branch attraction artifact that unites microsporidia and archaebacteria in EF-1alpha phylogenies. , 2004, Molecular biology and evolution.

[2]  David S. Gladstein,et al.  Efficient Incremental Character Optimization , 1997, Cladistics : the international journal of the Willi Hennig Society.

[3]  E. Webb Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .

[4]  Ambuj K. Singh,et al.  Deriving phylogenetic trees from the similarity analysis of metabolic pathways , 2003, ISMB.

[5]  Hideo Matsuda,et al.  A Multiple Alignment Algorithm for Metabolic Pathway Analysis Using Enzyme Hierarchy , 2000, ISMB.

[6]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[7]  A. Edwards,et al.  The reconstruction of evolution , 1963 .

[8]  Danail Bonchev,et al.  Phylogenetic distances are encoded in networks of interacting pathways , 2008, Bioinform..

[9]  Pablo A. Goloboff,et al.  CHARACTER OPTIMIZATION AND CALCULATION OF TREE LENGTHS , 1993 .

[10]  R. Levy,et al.  Simplified amino acid alphabets for protein fold recognition and implications for folding. , 2000, Protein engineering.

[11]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[12]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[13]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[14]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[15]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[16]  Susan R. Wilson INTRODUCTION TO COMPUTATIONAL BIOLOGY: MAPS, SEQUENCES AND GENOMES. , 1996 .

[17]  Piyush Goel,et al.  Ancestral Inference and the Study of Codon Bias Evolution: Implications for Molecular Evolutionary Analyses of the Drosophila melanogaster Subgroup , 2007, PloS one.

[18]  A. Sparks,et al.  Molecular resurrection of an extinct ancestral promoter for mouse L1. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[19]  A. Zeng,et al.  Phylogenetic comparison of metabolic capacities of organisms at genome level. , 2004, Molecular phylogenetics and evolution.

[20]  Jun Wang,et al.  A computational approach to simplifying the protein folding alphabet , 1999, Nature Structural Biology.

[21]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[22]  Kevin C. Nixon,et al.  A Novel Method for Economical Diagnosis of Cladograms under Sankoff Optimization , 1994 .

[23]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[24]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[25]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[26]  Ferenc Jordán,et al.  A network perspective on the topological importance of enzymes and their phylogenetic conservation , 2007, BMC Bioinformatics.

[27]  D. Penny Inferring Phylogenies.—Joseph Felsenstein. 2003. Sinauer Associates, Sunderland, Massachusetts. , 2004 .

[28]  Neal S. Holter,et al.  Amino acid classes and the protein folding problem , 2000, cond-mat/0010244.

[29]  Pasch,et al.  References and Notes Supporting Online Material Evolution of Hormone-receptor Complexity by Molecular Exploitation , 2022 .

[30]  Bernard B. Suh,et al.  Reconstructing contiguous regions of an ancestral genome. , 2006, Genome research.

[31]  Michael S. Waterman,et al.  Introduction to computational biology , 1995 .

[32]  R. F. Smith,et al.  Automatic generation of primary sequence patterns from sets of related protein sequences. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[33]  P. Lewis,et al.  Success of maximum likelihood phylogeny inference in the four-taxon case. , 1995, Molecular biology and evolution.

[34]  Christian V. Forst,et al.  Algebraic comparison of metabolic networks, phylogenetic inference, and metabolic innovation , 2006, BMC Bioinformatics.

[35]  D. Pollock,et al.  Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference. , 2004, Molecular biology and evolution.

[36]  Kenji Satou,et al.  Phylogenetic reconstruction from non-genomic data , 2007, Bioinform..

[37]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[38]  David L. Swofford,et al.  Uneconomical Diagnosis of Cladograms: Comments on Wheeler and Nixon's Method for Sankoff Optimization , 1997 .

[39]  Fredrik Ronquist Fast Fitch-Parsimony Algorithms for Large Data Sets , 1998 .

[40]  S. Lukyanov,et al.  GFP-like proteins as ubiquitous metazoan superfamily: evolution of functional features and structural complexity. , 2004, Molecular biology and evolution.

[41]  E. Eichler,et al.  Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution , 2007, Nature Genetics.

[42]  Pablo A. Goloboff,et al.  Tree Searches Under Sankoff Parsimony , 1998, Cladistics : the international journal of the Willi Hennig Society.

[43]  Brian W. Matthews,et al.  Ancestral lysozymes reconstructed, neutrality tested, and thermostability linked to hydrocarbon packing , 1990, Nature.

[44]  Jonathan P. Bollback,et al.  Empirical and hierarchical Bayesian estimation of ancestral states. , 2001, Systematic biology.

[45]  Jianzhi Zhang,et al.  Complementary advantageous substitutions in the evolution of an antiviral RNase of higher primates , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[46]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[47]  Ke Fan,et al.  What is the minimum number of letters required to fold a protein? , 2003, Journal of molecular biology.

[48]  David Crews,et al.  Resurrecting the Ancestral Steroid Receptor: Ancient Origin of Estrogen Signaling , 2003, Science.

[49]  David Sankoff,et al.  Locating the vertices of a steiner tree in an arbitrary metric space , 1975, Math. Program..

[50]  S. Jeffery Evolution of Protein Molecules , 1979 .

[51]  Bryan Kolaczkowski,et al.  Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous , 2004, Nature.

[52]  Kazuho Ikeo,et al.  Evolution of metabolic networks by gain and loss of enzymatic reaction in eukaryotes. , 2006, Gene.

[53]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.