Estimating trees from filtered data: identifiability of models for morphological phylogenetics.

As an alternative to parsimony analyses, stochastic models have been proposed (Lewis, 2001; Nylander et al., 2004) for morphological characters, so that maximum likelihood or Bayesian analyses may be used for phylogenetic inference. A key feature of these models is that they account for ascertainment bias, in that only varying, or parsimony-informative characters are observed. However, statistical consistency of such model-based inference requires that the model parameters be identifiable from the joint distribution they entail, and this issue has not been addressed. Here we prove that parameters for several such models, with finite state spaces of arbitrary size, are identifiable, provided the tree has at least eight leaves. If the tree topology is already known, then seven leaves suffice for identifiability of the numerical parameters. The method of proof involves first inferring a full distribution of both parsimony-informative and non-informative pattern joint probabilities from the parsimony-informative ones, using phylogenetic invariants. The failure of identifiability of the tree parameter for four-taxon trees is also investigated.

[1]  John A Rhodes,et al.  Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. , 2008, Mathematical biosciences.

[2]  H. Núñez,et al.  Mesozoic Fishes 4 - Homology and Phylogeny , 2008 .

[3]  J. Farris A Probability Model for Inferring Evolutionary Trees , 1973 .

[4]  J. Huelsenbeck,et al.  Bayesian phylogenetic analysis of combined data. , 2004, Systematic biology.

[5]  Susanne Schulmeister,et al.  Inconsistency of maximum parsimony revisited. , 2004, Systematic biology.

[6]  S. Gupta,et al.  Statistical decision theory and related topics IV , 1988 .

[7]  W. Massey A basic course in algebraic topology , 1991 .

[8]  A. Wald Note on the Consistency of the Maximum Likelihood Estimate , 1949 .

[9]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[10]  Martina Ramirez Homology as a parsimony problem: a dynamic homology approach for morphological data , 2007, Cladistics : the international journal of the Willi Hennig Society.

[11]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[12]  P. Hofstaetter [Similarity]. , 2020, Psyche.

[13]  B. Rannala Identi(cid:142)ability of Parameters in MCMC Bayesian Inference of Phylogeny , 2002 .

[14]  W. Hennig Phylogenetic Systematics , 2002 .

[15]  P. Lewis A likelihood approach to estimating phylogeny from discrete morphological character data. , 2001, Systematic biology.

[16]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[17]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[18]  P. Sereno Logical basis for morphological characters in phylogenetics , 2007, Cladistics : the international journal of the Willi Hennig Society.

[19]  Olivier Gascuel,et al.  Reconstructing evolution : new mathematical and computational advances , 2007 .

[20]  E. Allman,et al.  Phylogenetic invariants for the general Markov model of sequence mutation. , 2003, Mathematical biosciences.

[21]  Michael D. Hendy,et al.  Parsimony Can Be Consistent , 1993 .

[22]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[23]  Elizabeth S. Allman,et al.  Phylogenetic ideals and varieties for the general Markov model , 2004, Adv. Appl. Math..

[24]  Igor B. Rogozin,et al.  In search of lost introns , 2007, ISMB/ECCB.

[25]  Mathieu Blanchette,et al.  Exact and Heuristic Algorithms for the Indel Maximum Likelihood Problem , 2007, J. Comput. Biol..

[26]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[27]  Joseph Felsenstein,et al.  PHYLOGENIES FROM RESTRICTION SITES: A MAXIMUM‐LIKELIHOOD APPROACH , 1992, Evolution; international journal of organic evolution.

[28]  M. Steel Recovering a tree from the leaf colourations it generates under a Markov model , 1994 .

[29]  O. Rieppel,et al.  The Poverty of Taxonomic Characters , 2007 .

[30]  E. Mayr,et al.  Methods and Principles of Systematic Zoology , 1953 .

[31]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[32]  László A. Székely,et al.  Reconstructing Trees When Sequence Sites Evolve at Variable Rates , 1994, J. Comput. Biol..

[33]  A. Henderson Phylogenetic analysis of morphological data , 2002, Brittonia.

[34]  J. Neyman MOLECULAR STUDIES OF EVOLUTION: A SOURCE OF NOVEL STATISTICAL PROBLEMS* , 1971 .

[35]  J. Felsenstein,et al.  Invariants of phylogenies in a simple case with discrete states , 1987 .

[36]  J. A. Cavender Taxonomy with confidence , 1978 .