On the Problem of Discovering the Most Parsimonious Tree

The problem of discovering the most parsimonious tree is defined in terms of a set of linearly arrayed sequences. Simplifications are introduced to reduce the total amount of work including the elimination of uninformative positions and the recognition of equivalent positions. The procedure can be applied to any array of sequences, including amino acid. It is shown, however, that failure to convert such sequences, through the genetic code, into nucleotide sequences is very wasteful of pertinent information. Parsimony is shown as a procedure that minimizes discordancies (parallel and]or back substitutions). A procedure (a discordancy diagram) is given that enables one to recognize when two characters (nucleotide positions) will necessitate the acceptance of such discordancies and how many, at least, will be unavoidable. Subtraction of these unavoidable discordancies from a matrix of potential discordancies leads to a matrix of avoidable discordancies that generally give at least two pairs of taxa that are most closely related parsimoniously (i.e., zero avoidable discordancies) and may therefore be replaced by an ancestral form determined by the parsimony process. The parsimony process is also given. The process may be repetitively performed until the tree is completed. A method of determining a lower bound to the number of substitutions required is given that gives a much larger lower bound than previous estimates. A quick estimate of the upper bound is also provided. An alternative approach, using a Prim-Kruskal network (minimal spanning tree) on the avoidable discordancy distances, is given together with a procedure for interpreting such networks in terms of a phylogeny that appears more natural than the dendrograms usually employed in the interpretation of singlelinkage diagrams. The reduction of a strictly bifurcating tree to a Prim-Kruskal network is called compression and its reverse is called decompression with the original tree being recovered after a complete cycle of the two processes. There is a unique one-to-one correspondence between any phylogenetic tree and its compressed network form. The compressed form can be the basis for an unambiguous linear representation of a tree that is more compact than that of any other known method of representation.

[1]  N. F. Stewart,et al.  An improved solution to the generalized Camin-Sokal model for numerical cladistics. , 1974, Journal of theoretical biology.

[2]  G. Estabrook,et al.  A general solution in partial orders for the Camin-Sokal model in phylogeny. , 1968, Journal of theoretical biology.

[3]  Frank Harary,et al.  Graph Theory , 2016 .

[4]  J. Farris Methods for Computing Wagner Trees , 1970 .

[5]  G. Moore,et al.  A method for constructing maximum parsimony ancestral amino acid sequences on a given network. , 1973, Journal of theoretical biology.

[6]  R. Prim Shortest connection networks and some generalizations , 1957 .

[7]  Joseph Felsenstein,et al.  Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters , 1973 .

[8]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[9]  R. Sokal,et al.  A METHOD FOR DEDUCING BRANCHING SEQUENCES IN PHYLOGENY , 1965 .

[10]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[11]  J. Farris A Successive Approximations Approach to Character Weighting , 1969 .

[12]  J. Farris,et al.  Quantitative Phyletics and the Evolution of Anurans , 1969 .

[13]  P. H. A. Sneath,et al.  Detecting Evolutionary Incompatibilities From Protein Sequences , 1975 .

[14]  W. J. Quesne,et al.  A Method of Selection of Characters in Numerical Taxonomy , 1969 .

[15]  S. E. Dreyfus,et al.  The steiner problem in graphs , 1971, Networks.

[16]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[17]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[18]  Arnold G. Kluge,et al.  A Numerical Approach to Phylogenetic Systematics , 1970 .

[19]  D. Sankoff,et al.  Locating the vertices of a Steiner tree in arbitrary space , 1975 .

[20]  Jack Edmonds,et al.  Maximum matching and a polyhedron with 0,1-vertices , 1965 .

[21]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[22]  J. Hartigan MINIMUM MUTATION FITS TO A GIVEN TREE , 1973 .