Topological Bias in Distance-Based Phylogenetic Methods: Problems with Over- and Underestimated Genetic Distances

I show several types of topological biases in distance-based methods that use the least-squares method to evaluate branch lengths and the minimum evolution (ME) or the Fitch-Margoliash (FM) criterion to choose the best tree. For a 6-species tree, there are two tree shapes, one with three cherries (a cherry is a pair of adjacent leaves descending from the most recent common ancestor), and the other with two. When genetic distances are underestimated, the 3-cherry tree shape is favored with either the ME or FM criterion. When the genetic distances are overestimated, the ME criterion favors the 2-cherry tree, but the direction of bias with the FM criterion depends on whether negative branches are allowed, i.e. allowing negative branches favors the 3-cherry tree shape but disallowing negative branches favors the 2-cherry tree shape. The extent of the bias is explored by computer simulation of sequence evolution.

[1]  Z. Yang,et al.  How often do wrong models produce better phylogenies? , 1997, Molecular biology and evolution.

[2]  Olivier Gascuel,et al.  Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle , 2002, WABI.

[3]  J. Huelsenbeck Systematic bias in phylogenetic analysis: is the Strepsiptera problem solved? , 1998, Systematic biology.

[4]  M. Nei,et al.  Phylogenetic analysis in molecular evolutionary genetics. , 1996, Annual review of genetics.

[5]  P. Waddell,et al.  Rapid Evaluation of Least-Squares and Minimum-Evolution Criteria on Phylogenetic Trees , 1998 .

[6]  Xuhua Xia,et al.  18S ribosomal RNA and tetrapod phylogeny. , 2003, Systematic biology.

[7]  M. Nei,et al.  Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. , 2000, Molecular biology and evolution.

[8]  M. Nei,et al.  Molecular Evolution and Phylogenetics , 2000 .

[9]  John Robinson,et al.  Estimation of Phylogeny Using a General Markov Model , 2005, Evolutionary bioinformatics online.

[10]  J. S. Rogers,et al.  Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. , 2001, Systematic biology.

[11]  M. Nei,et al.  Theoretical foundation of the minimum-evolution method of phylogenetic inference. , 1993, Molecular biology and evolution.

[12]  J. Felsenstein An alternating least squares approach to inferring phylogenies from pairwise distances. , 1997, Systematic biology.

[13]  W. Bruno,et al.  Topological bias and inconsistency of maximum likelihood using wrong models. , 1999, Molecular biology and evolution.

[14]  D. Hillis,et al.  Taxonomic sampling, phylogenetic accuracy, and investigator bias. , 1998, Systematic biology.

[15]  Mark Kirkpatrick,et al.  DO PHYLOGENETIC METHODS PRODUCE TREES WITH BIASED SHAPES? , 1996, Evolution; international journal of organic evolution.

[16]  X. Xia,et al.  DAMBE: software package for data analysis in molecular biology and evolution. , 2001, The Journal of heredity.

[17]  Yong Wang,et al.  An index of substitution saturation and its application. , 2003, Molecular phylogenetics and evolution.

[18]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[19]  Xuhua Xia,et al.  Data Analysis in Molecular Biology and Evolution , 2002, Springer US.

[20]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[21]  A. Purvis,et al.  Phylogeny imbalance: taxonomic level matters. , 2002, Systematic biology.

[22]  X. Xia Molecular Phylogenetics: Mathematical Framework and Unsolved Problems , 2007 .