Analysis on the reconstruction accuracy of the Fitch method for inferring ancestral states

BackgroundAs one of the most widely used parsimony methods for ancestral reconstruction, the Fitch method minimizes the total number of hypothetical substitutions along all branches of a tree to explain the evolution of a character. Due to the extensive usage of this method, it has become a scientific endeavor in recent years to study the reconstruction accuracies of the Fitch method. However, most studies are restricted to 2-state evolutionary models and a study for higher-state models is needed since DNA sequences take the format of 4-state series and protein sequences even have 20 states.ResultsIn this paper, the ambiguous and unambiguous reconstruction accuracy of the Fitch method are studied for N-state evolutionary models. Given an arbitrary phylogenetic tree, a recurrence system is first presented to calculate iteratively the two accuracies. As complete binary tree and comb-shaped tree are the two extremal evolutionary tree topologies according to balance, we focus on the reconstruction accuracies on these two topologies and analyze their asymptotic properties. Then, 1000 Yule trees with 1024 leaves are generated and analyzed to simulate real evolutionary scenarios. It is known that more taxa not necessarily increase the reconstruction accuracies under 2-state models. The result under N-state models is also tested.ConclusionsIn a large tree with many leaves, the reconstruction accuracies of using all taxa are sometimes less than those of using a leaf subset under N-state models. For complete binary trees, there always exists an equilibrium interval [a, b] of conservation probability, in which the limiting ambiguous reconstruction accuracy equals to the probability of randomly picking a state. The value b decreases with the increase of the number of states, and it seems to converge. When the conservation probability is greater than b, the reconstruction accuracies of the Fitch method increase rapidly. The reconstruction accuracies on 1000 simulated Yule trees also exhibit similar behaviors. For comb-shaped trees, the limiting reconstruction accuracies of using all taxa are always less than or equal to those of using the nearest root-to-leaf path when the conservation probability is not less than 1N. As a result, more taxa are suggested for ancestral reconstruction when the tree topology is balanced and the sequences are highly similar, and a few taxa close to the root are recommended otherwise.

[1]  Steven A. Benner,et al.  Reconstructing the evolutionary history of the artiodactyl ribonuclease superfamily , 1995, Nature.

[2]  Richard A. Goldstein,et al.  Probabilistic reconstruction of ancestral protein sequences , 1996, Journal of Molecular Evolution.

[3]  Gabriel Valiente,et al.  Optimized ancestral state reconstruction using Sankoff parsimony , 2009, BMC Bioinformatics.

[4]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[5]  Mareike Fischer,et al.  Maximum parsimony on subsets of taxa. , 2009, Journal of theoretical biology.

[6]  Victor A. Albert,et al.  Parsimony, phylogeny, and genomics , 2006 .

[7]  M. Steel,et al.  Distributions on bicoloured evolutionary trees : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Mathematics at Massey University , 1989 .

[8]  Guoliang Li,et al.  Analyzing the Fitch Method for Reconstructing Ancestral States on Ultrametric Phylogenetic Trees , 2010, Bulletin of mathematical biology.

[9]  M Steel,et al.  Links between maximum likelihood and maximum parsimony under a simple model of site substitution. , 1997, Bulletin of mathematical biology.

[10]  David Sankoff,et al.  Locating the vertices of a steiner tree in an arbitrary metric space , 1975, Math. Program..

[11]  S. Tavaré,et al.  Estimating substitution rates from molecular data using the coalescent. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Todd H. Oakley,et al.  Comparative methods for the analysis of gene-expression evolution: an example using yeast functional genomic data. , 2005, Molecular biology and evolution.

[13]  M. Steel Distributions on bicoloured evolutionary trees , 1990, Bulletin of the Australian Mathematical Society.

[14]  Mike Steel,et al.  More taxa are not necessarily better for the reconstruction of ancestral character states. , 2008, Systematic biology.

[15]  D. Penny Inferring Phylogenies.—Joseph Felsenstein. 2003. Sinauer Associates, Sunderland, Massachusetts. , 2004 .

[16]  Jeffery K. Taubenberger,et al.  Characterization of the 1918 influenza virus polymerase genes , 2005, Nature.

[17]  Wayne P. Maddison,et al.  Calculating the Probability Distributions of Ancestral States Reconstructed by Parsimony on Phylogenetic Trees , 1995 .

[18]  Linus Pauling,et al.  Chemical Paleogenetics. Molecular "Restoration Studies" of Extinct Forms of Life. , 1963 .

[19]  Jonathan P. Bollback,et al.  Empirical and hierarchical Bayesian estimation of ancestral states. , 2001, Systematic biology.

[20]  M. Crisp,et al.  Do early branching lineages signify ancestral traits? , 2005, Trends in ecology & evolution.

[21]  Min Zhang,et al.  Genome Diversification in Phylogenetic Lineages I and II of Listeria monocytogenes: Identification of Segments Unique to Lineage II Populations , 2003, Journal of bacteriology.

[22]  Jianzhi Zhang,et al.  Complementary advantageous substitutions in the evolution of an antiviral RNase of higher primates , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[24]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[25]  P. Lio’,et al.  Models of molecular evolution and phylogeny. , 1998, Genome research.

[26]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[27]  BMC Bioinformatics , 2005 .

[28]  J. Huelsenbeck,et al.  Application and accuracy of molecular phylogenies. , 1994, Science.

[29]  M. Nei,et al.  Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods , 2009, Journal of Molecular Evolution.

[30]  B. Salisbury,et al.  Ancestral state estimation and taxon sampling density. , 2001, Systematic biology.

[31]  D. Pollock,et al.  Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference. , 2004, Molecular biology and evolution.

[32]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[33]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.