Comparing evolutionary distances via adaptive distance functions.

Distance-based methods for phylogenetic reconstruction are based on a two-step approach: first, pairwise distances are computed from DNA sequences associated with a given set of taxa, and then these distances are used to reconstruct the phylogenetic relationships between taxa. Because the estimated distances are based on finite sequences, they are inherently noisy, and this noise may result in reconstruction errors. Previous attempts to improve reconstruction accuracy focused either on improving the robustness of reconstruction algorithms to this stochastic noise, or on improving the accuracy of the distance estimates. Here, we aim to further improve reconstruction accuracy by utilizing the basic observation that reconstruction algorithms are based on a series of comparisons between distances (or linear combinations of distances). We start by examining the relationship between the stochastic noise in the sequence data and the accuracy of the comparisons between pairwise distance estimates. This examination results in improved methods for distance comparison, which are shown to be as accurate as likelihood-based methods, while being much simpler and more efficient to compute. We then extend these methods to improve reconstruction accuracy of quartet trees, and examine some of the challenges moving forward.

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[3]  P. Erdös,et al.  A few logs suffice to build (almost) all trees (l): part I , 1997 .

[4]  J. A. Cavender Taxonomy with confidence , 1978 .

[5]  Andrew R Francis,et al.  Maximum likelihood estimates of pairwise rearrangement distances. , 2016, Journal of theoretical biology.

[6]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[7]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[8]  Irad Yavneh,et al.  Adaptive Distance Measures for Resolving K2P Quartets: Metric Separation versus Stochastic Noise , 2010, J. Comput. Biol..

[9]  J. Bergsten A review of long‐branch attraction , 2005, Cladistics : the international journal of the Willi Hennig Society.

[10]  Irad Yavneh,et al.  Towards optimal distance functions for stochastic substitution models. , 2009, Journal of theoretical biology.

[11]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[12]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[13]  J. Farris A Probability Model for Inferring Evolutionary Trees , 1973 .

[14]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction , 1999, Algorithmica.

[15]  J. Neyman MOLECULAR STUDIES OF EVOLUTION: A SOURCE OF NOVEL STATISTICAL PROBLEMS* , 1971 .

[16]  M. Siddall,et al.  Success of Parsimony in the Four‐Taxon Case: Long‐Branch Repulsion by Likelihood in the Farris Zone , 1998 .

[17]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[18]  Sagi Snir,et al.  Fast and reliable reconstruction of phylogenetic trees with indistinguishable edges , 2012, Random Struct. Algorithms.

[19]  D. Hoyle,et al.  Factors affecting the errors in the estimation of evolutionary distances between sequences. , 2003, Molecular biology and evolution.

[20]  A. Tversky,et al.  Additive similarity trees , 1977 .

[21]  A. Zharkikh Estimation of evolutionary distances between nucleotide sequences , 1994, Journal of Molecular Evolution.

[22]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..