On the consistency of the minimum evolution principle of phylogenetic inference

The goal of phylogenetic inference is the reconstuction of the evolutionary history of various biological entities (taxa) such as genes, proteins, viruses or species. Phylogenetic inference is of major importance in computational biology and has numerous applications ranging from the study of biodiversity to sequence analysis. Given a matrix of pairwise distances between taxa, the minimum evolution (ME) principle consists in selecting the tree whose length is minimal, where the tree length is estimated within the least-squares framework. The ME principle has been shown to be statistically consistent when using the ordinary least-squares criterion (OLS) and inconsistent with the more general weighted least-squares criterion (WLS). Unfortunately, OLS+ME inference method can provide poor results since the variances of the input data are not taken into account. Here we study a model which lies between OLS and WLS, classical in statistics and data analysis, and we prove that the ME principle is statistically consistent within this model. Our proof is inductive and relies on a time optimal recursive algorithm for estimating edge lengths. As a corollary, we obtain a different and simpler proof of the consistency result for OLS+ME.

[1]  K. Kidd,et al.  Phylogenetic analysis: concepts and methods. , 1971, American journal of human genetics.

[2]  M. Nei,et al.  Theoretical foundation of the minimum-evolution method of phylogenetic inference. , 1993, Molecular biology and evolution.

[3]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[4]  A. Dress,et al.  Reconstructing the shape of a tree from observed dissimilarity data , 1986 .

[5]  Olivier Gascuel,et al.  Concerning the NJ algorithm and its unweighted version, UNJ , 1996, Mathematical Hierarchies and Biology.

[6]  Vladimir Makarenkov,et al.  An Algorithm for the Fitting of a Tree Metric According to a Weighted Least-Squares Criterion , 1999 .

[7]  Ye.A Smolenskii A method for the linear recording of graphs , 1963 .

[8]  J. M. S. S. Pereira,et al.  A note on the tree realizability of a distance matrix , 1969 .

[9]  Avid,et al.  Strengths and Limitations of the Minimum Evolution Principle , 2001 .

[10]  P. Buneman A Note on the Metric Properties of Trees , 1974 .

[11]  W. Vach Least squares approximation of addititve trees , 1989 .

[12]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[13]  A. Tversky,et al.  Additive similarity trees , 1977 .

[14]  M. Bulmer Use of the Method of Generalized Least Squares in Reconstructing Phylogenies from Sequence Data , 1991 .

[15]  P. Waddell,et al.  Rapid Evaluation of Least-Squares and Minimum-Evolution Criteria on Phylogenetic Trees , 1998 .

[16]  Otto Optiz,et al.  Conceptual and Numerical Analysis of Data , 1989 .

[17]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[18]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[19]  F. McMorris,et al.  Mathematical Hierarchies and Biology , 1997 .

[20]  O. Gascuel On the optimization principle in phylogenetic analysis and the minimum-evolution criterion. , 2000, Molecular biology and evolution.

[21]  D. Sengupta Linear models , 2003 .

[22]  S. R. Searle Linear Models , 1971 .

[23]  M. Nei,et al.  The neighbor-joining method , 1987 .

[24]  Hans-Jürgen Bandelt,et al.  Symmetric Matrices Representable by Weighted Trees Over a Cancellative Abelian Monoid , 1995, SIAM J. Discret. Math..