A Likelihood Approach to Estimating Phylogeny from Discrete Morphological Character Data

—Evolutionary biologists have adopted simple likelihood models for purposes of estimating ancestral states and evaluating character independence on speciŽed phylogenies; however, for purposes of estimating phylogenies by using discrete morphological data, maximum parsimony remains the only option. This paper explores the possibility of using standard, well-behaved Markov models for estimating morphological phylogenies (including branch lengths) under the likelihood criterion. An important modiŽcation of standard Markov models involves making the likelihood conditional on characters being variable, because constant characters are absent in morphological data sets. Without this modiŽcation, branch lengths are often overestimated, resulting in potentially serious biases in tree topology selection. Several new avenues of research are opened by an explicitly model-based approach to phylogenetic analysis of discrete morphological data, including combined-data likelihood analyses (morphology C sequence data), likelihood ratio tests, and Bayesian analyses. [Discrete morphological character; Markov model; maximum likelihood; phylogeny.] The increased availability of nucleotide and protein sequences from a diversity of both organisms and genes has stimulated the development of stochastic models describing evolutionary change in molecular sequences over time. Such models are not only useful for estimating molecular evolutionary parameters of interest but also important as the basis for phylogenetic inference using the method of maximum likelihood (ML) and Bayesian inference. ML provides a very general framework for estimation and has been extensively applied in diverse Želds of science (Casella and Berger, 1990); however, the popularity of ML in phylogenetic inference has lagged behind that of other optimality criteria (such as maximum parsimony), primarily because of its much greater computational cost for evaluating any given candidate tree. Recent developments on the algorithmic aspects of ML inference as applied to phylogeny reconstruction (Olsen et al., 1994; Lewis, 1998; Salter and Pearl, 2001; Swofford, 2001) have succeeded in reducing this computational cost substantially, and ML phylogeny estimates involving hundreds of terminal taxa are now entering the realm of feasibility. Bayesian methods (based on a likelihood foundation) offer the prospect of obtaining meaningful nodal support measures without the unreasonable computational burden imposed by existing methods such as bootstrapping (Rannala and Yang, 1996; Yang and Rannala, 1997; Larget and Simon, 1999; Mau et al., 1999; Huelsenbeck, 2000a). Furthermore, the Bayesian approach makes it possible to test hypotheses involving phylogenies without depending on any particular hypothesized tree (e.g., Huelsenbeck, 2000b), so likelihood models are expected to play an ever-increasing role in systematics and related disciplines. ML, least squares, and minimum evolution are all distinguished from maximum parsimony in being model-based optimality criteria. ML and maximum parsimony are similar in being discrete character methods, unlike minimum evolution and least squares, which are based on a matrix of pairwise evolutionary distances between terminal taxa. Despite the early availability of a likelihood model for continuous traits (Felsenstein, 1973), the use of model-based optimality criteria has heretofore been restricted primarily to molecular data, with maximum parsimony being the only criterion applied to both discrete morphological and molecular data. Models have been applied to discrete morphological traits, but the purpose of these models has been to infer ancestral states (e.g., Schluter et al., 1997; Mooers and Schluter, 1999; Pagel, 1999), to assess the magnitude of the evolutionary correlation between different traits (Pagel, 1994), or to investigate the properties of other optimality criteria (Felsenstein, 1981a), but not for phylogeny reconstruction per se. Although no one has suggested using likelihood for estimating trees, two models have 913 at U nivsitatea T enic , C lj-N aoca on M ay 1, 2010 http://sysfordjournals.org D ow nladed fom 914 SYSTEMATIC BIOLOGY VOL. 50 been previously described for purposes of investigating properties of the parsimony method. Goldman (1990) described a simple likelihood model (hereafter referred to as the G90 model) that always chooses the exact same tree (or trees) as equal-weighted Fitch parsimony. Later, Penny et al. (1994) and Tufey and Steel (1997) found that a very different model (hereafter, the TS97 model) also selects trees identical to those selected by parsimony. The G90 model has only one branch (i.e., edge) length parameter that governs the probability of observing a change across any branch of the tree; however, the model requires implicit estimation of the ancestral character states at each interior node of the tree. Goldman (1990) emphasized that a negative side effect of these nuisance parameters, the number of which grows with the number of characters, is likely to be statistical inconsistency. A method is statistically consistent if the estimates produced by the method come closer to the true value of the quantity being estimated as the sample size increases to inŽnity (Casella and Berger, 1990:323). Statistical consistency is thus a desirable asymptotic property of a statistical inference method, as has been pointed out numerous times with respect to the choice of likelihood versus parsimony methods (e.g., Felsenstein, 1978). The TS97 model is also very parameterrich. For a problem involving n taxa and m characters, the TS97 model has effectively m(2n ¡ 3) separate parameters (a separate parameter for every branch/character combination). This model was called the “no common mechanism” model by Tufey and Steel because it allowed the rate of evolution for one particular branch and one particular character to be independent of the rate for any other branch and every other character. Tufey and Steel (1997:599) cautioned, however, that “. . . the number of parameters being estimated grows linearly with the number of characters, so the statistical consistency of these two methods is not guaranteed by standard results. Indeed, the former method can be provably statistically inconsistent . . . .” Here, “former method” refers to the “no common mechanism” model. The G90 and TS97 models thus have very little in common except the fact that they are both parsimony models (i.e., the set of tree topologies chosen is identical to the set chosen by parsimony) and the number of parameters in both grows as a function of the number of characters. Goldman (1990) emphasized the importance of using only structural parameters (parameters that appear in the likelihood function for all characters) and avoiding the use of incidental parameters (parameters that appear in the likelihood functions for only some characters) in models used for phylogenetic inference. In the classical models currently used for ML phylogeny reconstruction, all parameters are structural parameters. For example, the transition/transversion rate ratio parameter used in the HKY85 model (Hasegawa et al., 1985) is necessary for calculating the likelihood for every site, and the same can be said for any branch length parameter and any nucleotide frequency parameter in this model. In contrast, the ancestral states estimated in the G90 model are incidental parameters, since their value is only used in calculating the likelihood associated with a single character. Likewise, the branch probability parameters of the TS97 model are incidental parameters because each is used in computing the likelihood for only one character. Models incorporating incidental parameters are susceptible to problems with statistical inconsistency, and Goldman (1990) noted that the presence of incidental parameters can make estimates of the structural parameters in the model inconsistent as well. There is a growing tendency to discount the importance of statistical consistency in phylogeny inference (e.g., Farris, 1999); however, avoiding (where possible) models that may be statistically inconsistent even when their assumptions are not violated seems prudent. The G90 and TS97 parsimony models both have this property. The purpose of this paper is to discuss the applicability of ML phylogeny inference to discrete morphological data. The TS97 model provides an excellent comparison because it gives results identical to parsimony, currently the only option for phylogenetic analyses involving discrete morphological characters. The G90 model is less attractive for comparison because its assumption of equal branch lengths and estimated ancestral states make it substantially different from the models currently used in phylogenetics for sequence data. In the terminology of Steel and Penny (2000), TS97 and the standard substitution models used at U nivsitatea T enic , C lj-N aoca on M ay 1, 2010 http://sysfordjournals.org D ow nladed fom 2001 LEWIS—MAXIMUM LIKELIHOOD MORPHOLOGICAL PHYLOGENY 915 for sequence data are all “maximum average likelihood” methods, whereas G90 is in a different class, the “most-parsimonious likelihood” methods. In this paper, I strongly emphasize avoiding incidental parameters so that the model will be well-formulated and statistically well-behaved. I also show that standard Markov models, that is, generalizations of the Jukes and Cantor (1969:JC69) model, represent modiŽed versions of the TS97 model and avoid the aforementioned problems with incidental parameters that lead to potential statistical inconsistency. Discussion will center around whether the modiŽcations necessary to make the TS97 model statistically sound are biologically justiŽed. I conclude with a discussion of interesting extensions to the basic model and touch on the wealth of opportunities that model-based approaches open up for systematic biologists. A BASIC LIKELIHOOD MODEL FOR DISCRETE MORPHOLOGICAL

[1]  A Gajdos,et al.  [Evolution of protein molecules. I. Protein synthesis]. , 1972, La Nouvelle presse medicale.

[2]  M. Pagel Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[3]  D. Schluter,et al.  LIKELIHOOD OF ANCESTOR STATES IN ADAPTIVE RADIATION , 1997, Evolution; international journal of organic evolution.

[4]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[5]  Wayne P. Maddison,et al.  Calculating the Probability Distributions of Ancestral States Reconstructed by Parsimony on Phylogenetic Trees , 1995 .

[6]  G A Churchill,et al.  Sample size for a phylogenetic inference. , 1992, Molecular biology and evolution.

[7]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[8]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[9]  J. Reeves,et al.  Heterogeneity in the substitution process of amino acid sites of proteins coded for by mitochondrial DNA , 1992, Journal of Molecular Evolution.

[10]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[11]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[12]  M Steel,et al.  Invariable sites models and their use in phylogeny reconstruction. , 2000, Systematic biology.

[13]  D. Schluter,et al.  RECONSTRUCTING ANCESTOR STATES WITH MAXIMUM LIKELIHOOD : SUPPORT FOR ONE- AND TWO-RATE MODELS , 1999 .

[14]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[15]  G. Churchill,et al.  THE RECONSTRUCTION OF ANCESTRAL CHARACTER STATES , 1996, Evolution; international journal of organic evolution.

[16]  H. Reeve,et al.  USING PHYLOGENIES TO TEST HYPOTHESES OF ADAPTATION: A CRITIQUE OF SOME CURRENT PROPOSALS , 1994, Evolution; international journal of organic evolution.

[17]  J. Felsenstein Maximum-likelihood estimation of evolutionary trees from continuous characters. , 1973, American journal of human genetics.

[18]  Mike Steel,et al.  Links between maximum likelihood and maximum parsimony under a simple model of site substitution , 1997 .

[19]  D. Pearl,et al.  Stochastic search strategy for estimation of maximum likelihood phylogenetic trees. , 2001, Systematic biology.

[20]  Michael J. Sanderson,et al.  Homoplasy. The Recurrence of Similarity in Evolution. , 1997 .

[21]  John P. Huelsenbeck,et al.  A Likelihood Ratio Test to Detect Conflicting Phylogenetic Signal , 1996 .

[22]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[23]  Joseph Felsenstein,et al.  A likelihood approach to character weighting and what it tells us about parsimony and compatibility , 1981 .

[24]  M A Newton,et al.  Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods , 1999, Biometrics.

[25]  J. Neyman MOLECULAR STUDIES OF EVOLUTION: A SOURCE OF NOVEL STATISTICAL PROBLEMS* , 1971 .

[26]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[27]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[28]  Joseph Felsenstein,et al.  PHYLOGENIES FROM RESTRICTION SITES: A MAXIMUM‐LIKELIHOOD APPROACH , 1992, Evolution; international journal of organic evolution.

[29]  R. Belshaw,et al.  Incongruence Between Morphological Data Sets: An Example from the Evolution of Endoparasitism Among Parasitic Wasps (Hymenoptera: Braconidae) , 1999 .

[30]  B Rannala,et al.  Accommodating phylogenetic uncertainty in evolutionary studies. , 2000, Science.

[31]  N. Reid,et al.  Likelihood , 1993 .

[32]  J. Farris Likelihood and Inconsistency , 1999, Cladistics : the international journal of the Willi Hennig Society.

[33]  J. Farris A Probability Model for Inferring Evolutionary Trees , 1973 .

[34]  D Penny,et al.  Parsimony, likelihood, and the role of models in molecular phylogenetics. , 2000, Molecular biology and evolution.

[35]  B. Rannala,et al.  Phylogenetic methods come of age: testing hypotheses in an evolutionary context. , 1997, Science.

[36]  M. Pagel The Maximum Likelihood Approach to Reconstructing Ancestral Character States of Discrete Characters on Phylogenies , 1999 .

[37]  Nick Goldman,et al.  MAXIMUM LIKELIHOOD INFERENCE OF PHYLOGENETIC TREES, WITH SPECIAL REFERENCE TO A POISSON PROCESS MODEL OF DNA SUBSTITUTION AND TO PARSIMONY ANALYSES , 1990 .