The Effects of Sequence Length, Tree Topology, and Number of Taxa on the Performance of Phylogenetic Methods

Simulations were used to study the performance of several character-based and distance-based phylogenetic methods in obtaining the correct tree from pseudo-randomly generated input data. The study included all the topologies of unrooted binary trees with from 4 to 10 pendant vertices (taxa) inclusive. The length of the character sequences used ranged from 10 to 10(5) characters exponentially. The methods studied include Closest Tree, Compatibility, Li's method, Maximum Parsimony, Neighbor-joining, Neighborliness, and UPGMA. We also provide a modification to Li's method (SimpLi) which is consistent with additive data. We give estimations of the sequence lengths required for given confidence in the output of these methods under the assumptions of molecular evolution used in this study. A notation for characterizing all tree topologies is described. We show that when the number of taxa, the maximum path length, and the minimum edge length are held constant, there it little but significant dependence of the performance of the methods on the tree topology. We show that those methods that are consistent with the model used perform similarly, whereas the inconsistent methods, UPGMA and Li's method, perform very poorly.

[1]  Michael D. Hendy,et al.  Hadamard conjugation: a versatile tool for modelling nucleotide sequence evolution , 1993 .

[2]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[3]  N. Saitou,et al.  Relative Efficiencies of the Fitch-Margoliash, Maximum-Parsimony, Maximum-Likelihood, Minimum-Evolution, and Neighbor-joining Methods of Phylogenetic Tree Construction in Obtaining the Correct Tree , 1989 .

[4]  M. Nei,et al.  Relative efficiencies of the maximum-parsimony and distance-matrix methods of phylogeny construction for restriction data. , 1991, Molecular biology and evolution.

[5]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[6]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[7]  P. Sharp,et al.  Reconstruction of phylogenetic trees and estimation of divergence times under nonconstant rates of evolution. , 1987, Cold Spring Harbor symposia on quantitative biology.

[8]  Michael D. Hendy,et al.  The sampling distributions and covariance matrix of phylogenetic spectra , 1994 .

[9]  D. Penny,et al.  Neighbor-joining uses the optimal weight for net divergence. , 1993, Molecular phylogenetics and evolution.

[10]  Mike A. Steel,et al.  Distribution of the Symmetric Difference Metric on Phylogenetic Trees , 1988, SIAM J. Discret. Math..

[11]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[12]  Nick Goldman,et al.  MAXIMUM LIKELIHOOD INFERENCE OF PHYLOGENETIC TREES, WITH SPECIAL REFERENCE TO A POISSON PROCESS MODEL OF DNA SUBSTITUTION AND TO PARSIMONY ANALYSES , 1990 .

[13]  A. Tversky,et al.  Additive similarity trees , 1977 .

[14]  W. Li,et al.  Simple method for constructing phylogenetic trees from distance matrices. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[15]  D. Penny,et al.  Branch and bound algorithms to determine minimal evolutionary trees , 1982 .

[16]  M. A. STEEL,et al.  Loss of information in genetic distances , 1988, Nature.

[17]  D. Penny,et al.  Spectral analysis of phylogenetic data , 1993 .

[18]  D Penny,et al.  Progress with methods for constructing evolutionary trees. , 1992, Trends in ecology & evolution.

[19]  Michael D. Hendy,et al.  A combinatorial description of the closest tree algorithm for finding evolutionary trees , 1991, Discret. Math..