Evolutionary Trees and Ordinal Assertions

Abstract. Sequence data for a group of species is often summarized by a distance matrix M where M[s,t] is the dissimilarity between the sequences of species s and t . An ordinal assertion is a statement of the form ``species a and b are as similar as species c and d '' and is supported by distance matrix M if M[a,b] ≤ M[c,d] . Recent preliminary research suggests that ordinal assertions can be used to reconstruct the evolutionary history of a group of species effectively. However, further research on the mathematical and algorithmic properties of ordinal assertions is needed to facilitate the development and assessment of inference methods that utilize ordinal assertions for reconstructing evolutionary histories. A (weighted ) ordinal representation of a distance matrix M is a (weighted) phylogeny T such that, for all species a , b , c , and d labeling T , dT(a,b) ≤ dT(c,d) if and only if M[a,b] ≤ M[c,d], where dT is the weighted path length when T is weighted, otherwise dT is the unweighted path length. Hence, an ordinal representation of M is a phylogeny that supports the same ordinal assertions supported by M , and so is the focus of our examination of the mathematical and algorithmic properties of ordinal assertions. As it turns out, ordinal representations are rich in structure. In this paper several results on weighted and unweighted ordinal representations are presented: — The unweighted ordinal representation of a distance matrix is unique. This generalizes the well-known result that no two phylogenies share the same distance matrix [10], [21].— The unweighted ordinal representation of a distance matrix can be found in O(n2 log 2(n)) time. The algorithm presented improves upon an O(n3) algorithm by Kannan and Warnow [13] that finds binary unweighted ordinal representations of distance matrices.— Under certain conditions, weighted ordinal representations can be found in polynomial time.

[1]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[2]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[3]  W. H. Day Computational complexity of inferring phylogenies from dissimilarity matrices. , 1987, Bulletin of mathematical biology.

[4]  Paul E. Kearney,et al.  A Six-Point Condition for Ordinal Matrices , 1997, J. Comput. Biol..

[5]  J. Felsenstein Numerical Methods for Inferring Evolutionary Trees , 1982, The Quarterly Review of Biology.

[6]  W. A. Beyer,et al.  A molecular sequence metric and evolutionary trees , 1974 .

[7]  W. A. Beyer,et al.  Additive evolutionary trees. , 1977, Journal of theoretical biology.

[8]  J. Huelsenbeck,et al.  Hobgoblin of phylogenetics? , 1994, Nature.

[9]  Piotr Rudnicki,et al.  A Fast Algorithm for Constructing Trees from Distance Matrices , 1989, Inf. Process. Lett..

[10]  A. Dress,et al.  Split decomposition: a new and useful approach to phylogenetic analysis of distance data. , 1992, Molecular phylogenetics and evolution.

[11]  Paul E. Kearney,et al.  The ordinal quartet method , 1998, RECOMB '98.

[12]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[13]  Sampath Kannan,et al.  Tree Reconstruction from Partial Orders , 1995, SIAM J. Comput..

[14]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[15]  Henk Meijer,et al.  Inferring evolutionary trees from ordinal data , 1997, SODA '97.

[16]  S. S. Yau,et al.  Distance matrix of a graph and its realizability , 1965 .

[17]  Sampath Kannan,et al.  A robust model for finding optimal evolutionary trees , 1993, Algorithmica.

[18]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[19]  A. Tversky,et al.  Additive similarity trees , 1977 .