Phylogenetic analysis using parsimony and likelihood methods

The assumptions underlying the maximum-parsimony (MP) method of phylogenetic tree reconstruction were intuitively examined by studying the way the method works. Computer simulations were performed to corroborate the intuitive examination. Parsimony appears to involve very stringent assumptions concerning the process of sequence evolution, such as constancy of substitution rates between nucleotides, constancy of rates across nucleotide sites, and equal branch lengths in the tree. For practical data analysis, the requirement of equal branch lengths means similar substitution rates among lineages (the existence of an approximate molecular clock), relatively long interior branches, and also few species in the data. However, a small amount of evolution is neither a necessary nor a sufficient requirement of the method. The difficulties involved in the application of current statistical estimation theory to tree reconstruction were discussed, and it was suggested that the approach proposed by Felsenstein (1981,J. Mol. Evol. 17: 368–376) for topology estimation, as well as its many variations and extensions, differs fundamentally from the maximum likelihood estimation of a conventional statistical parameter. Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter. Computer simulations were performed to study the probability that MP recovers the true tree under a hierarchy of models of nucleotide substitution; its performance relative to the likelihood method was especially noted. The results appeared to support the intuitive examination of the assumptions underlying MP. When a simple model of nucleotide substitution was assumed to generate data, the probability that MP recovers the true topology could be as high as, or even higher than, that for the likelihood method. When the assumed model became more complex and realistic, e.g., when substitution rates were allowed to differ between nucleotides or across sites, the probability that MP recovers the true topology, and especially its performance relative to that of the likelihood method, generally deteriorates. As the complexity of the process of nucleotide substitution in real sequences is well recognized, the likelihood method appears preferable to parsimony. However, the development of a statistical methodology for the efficient estimation of the tree topology remains a difficult open problem.

[1]  A. Wald Note on the Consistency of the Maximum Likelihood Estimate , 1949 .

[2]  Maurice G. Kendall The advanced theory of statistics , 1958 .

[3]  A. Edwards,et al.  The reconstruction of evolution , 1963 .

[4]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[5]  Linus Pauling,et al.  Chemical Paleogenetics. Molecular "Restoration Studies" of Extinct Forms of Life. , 1963 .

[6]  R. Sokal,et al.  A METHOD FOR DEDUCING BRANCHING SEQUENCES IN PHYLOGENY , 1965 .

[7]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[8]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[9]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[10]  A. Edwards,et al.  Estimation of the Branch Points of a Branching Diffusion Process , 1970 .

[11]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[12]  I. D. Hill The Normal Integral , 1973 .

[13]  J. Hartigan MINIMUM MUTATION FITS TO A GIVEN TREE , 1973 .

[14]  J. Farris A Probability Model for Inferring Evolutionary Trees , 1973 .

[15]  Joseph Felsenstein,et al.  Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters , 1973 .

[16]  Elizabeth A. Thompson,et al.  Human Evolutionary Trees , 1975 .

[17]  E. Wiley,et al.  Karl R. Popper, Systematics, and Classification: A Reply to Walter Bock and Other Evolutionary Taxonomists , 1975 .

[18]  J. Farris Phylogenetic Analysis Under Dollo's Law , 1977 .

[19]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[20]  S. Jeffery Evolution of Protein Molecules , 1979 .

[21]  M. Hasegawa,et al.  MAXIMUM LIKELIHOOD METHOD OF PHYLOGENETIC INFERENCE FROM DNA SEQUENCE DATA , 1984 .

[22]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[23]  M. Bishop,et al.  Evolutionary trees from nucleic acid and protein sequences , 1985, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[24]  Joseph Felsenstein,et al.  Parsimony and likelihood: an exchange , 1986 .

[25]  M. Nei Molecular Evolutionary Genetics , 1987 .

[26]  C. Krimbas,et al.  Accuracy of phylogenetic trees estimated from DNA sequence data. , 1987, Molecular biology and evolution.

[27]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[28]  D. Simberloff,et al.  Molecules and Morphology in Evolution: Conflict or Compromise? , 1987 .

[29]  D. Penny,et al.  Reliability of evolutionary trees. , 1987, Cold Spring Harbor symposia on quantitative biology.

[30]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[31]  N. Saitou,et al.  Relative Efficiencies of the Fitch-Margoliash, Maximum-Parsimony, Maximum-Likelihood, Minimum-Evolution, and Neighbor-joining Methods of Phylogenetic Tree Construction in Obtaining the Correct Tree , 1989 .

[32]  Michael D. Hendy,et al.  A Framework for the Quantitative Study of Evolutionary Trees , 1989 .

[33]  Gareth Nelson,et al.  Reconstructing the Past: Parsimony, Evolution, and Inference , 1989 .

[34]  Peter Godfrey-Smith,et al.  Reconstructing the Past: Parsimony, Evolution, and Inference , 1989 .

[35]  L. Jin,et al.  Limitations of the evolutionary parsimony method of phylogenetic analysis. , 1990, Molecular biology and evolution.

[36]  Nick Goldman,et al.  MAXIMUM LIKELIHOOD INFERENCE OF PHYLOGENETIC TREES, WITH SPECIAL REFERENCE TO A POISSON PROCESS MODEL OF DNA SUBSTITUTION AND TO PARSIMONY ANALYSES , 1990 .

[37]  Wayne P. Maddison,et al.  Macclade: Analysis of Phylogeny and Character Evolution/Version 3 , 1992 .

[38]  R. Debry,et al.  The consistency of several phylogeny-inference methods under varying evolutionary rates. , 1992, Molecular biology and evolution.

[39]  A. Zharkikh,et al.  Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. , 1992, Molecular biology and evolution.

[40]  A. von Haeseler,et al.  A Simple Method to Improve the Reliability of Tree Reconstructions , 1993 .

[41]  M. Hasegawa,et al.  Relative efficiencies of the maximum likelihood, maximum parsimony, and neighbor-joining methods for estimating protein phylogeny. , 1993, Molecular phylogenetics and evolution.

[42]  Joseph Felsenstein,et al.  Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull , 1993 .

[43]  J. Huelsenbeck,et al.  SUCCESS OF PHYLOGENETIC METHODS IN THE FOUR-TAXON CASE , 1993 .

[44]  Andrey A. Zharkikh,et al.  Inconsistency of the Maximum-parsimony Method: the Case of Five Taxa With a Molecular Clock , 1993 .

[45]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[46]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[47]  M. Nei,et al.  Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site. , 1994, Molecular biology and evolution.

[48]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[49]  Richard M. Brugger Univariate Discrete Distributions (2nd Ed.) , 1994 .

[50]  N. Goldman,et al.  Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. , 1994, Molecular biology and evolution.

[51]  Wen-Hsiung Li,et al.  What is the Bootstrap Technique , 1994 .

[52]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[53]  N Takezaki,et al.  Estimation of evolutionary distance for reconstructing molecular phylogenetic trees. , 1994, Molecular biology and evolution.

[54]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[55]  Ziheng Yang Statistical Properties of the Maximum Likelihood Method of Phylogenetic Estimation and Comparison With Distance Matrix Methods , 1994 .

[56]  Nick Goldman,et al.  MAXIMUM LIKELIHOOD TREES FROM DNA SEQUENCES: A PECULIAR STATISTICAL ESTIMATION PROBLEM , 1995 .

[57]  P. Lewis,et al.  Success of maximum likelihood phylogeny inference in the four-taxon case. , 1995, Molecular biology and evolution.

[58]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[59]  D. Maddison,et al.  MacClade 4: analysis of phy-logeny and character evolution , 2003 .

[60]  Ziheng Yang Estimating the pattern of nucleotide substitution , 1994, Journal of Molecular Evolution.

[61]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[62]  Ziheng Yang,et al.  Evaluation of several methods for estimating phylogenetic trees when substitution rates differ over nucleotide sites , 1995, Journal of Molecular Evolution.

[63]  Masatoshi Nei,et al.  Inconsistency of the maximum parsimony method when the rate of nucleotide substitution is constant , 1994, Journal of Molecular Evolution.

[64]  H. Kishino,et al.  Maximum likelihood inference of protein phylogeny and the origin of chloroplasts , 1990, Journal of Molecular Evolution.

[65]  Y. Tateno,et al.  Robustness of maximum likelihood tree estimation against different patterns of base substitutions , 2005, Journal of Molecular Evolution.

[66]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[67]  Allan C. Wilson,et al.  Mitochondrial DNA sequences of primates: Tempo and mode of evolution , 2005, Journal of Molecular Evolution.

[68]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[69]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.