Effects of missing data on topological inference using a Total Evidence approach.

To fully understand macroevolutionary patterns and processes, we need to include both extant and extinct species in our models. This requires phylogenetic trees with both living and fossil taxa at the tips. One way to infer such phylogenies is the Total Evidence approach which uses molecular data from living taxa and morphological data from living and fossil taxa. Although the Total Evidence approach is very promising, it requires a great deal of data that can be hard to collect. Therefore this method is likely to suffer from missing data issues that may affect its ability to infer correct phylogenies. Here we use simulations to assess the effects of missing data on tree topologies inferred from Total Evidence matrices. We investigate three major factors that directly affect the completeness and the size of the morphological part of the matrix: the proportion of living taxa with no morphological data, the amount of missing data in the fossil record, and the overall number of morphological characters in the matrix. We infer phylogenies from complete matrices and from matrices with various amounts of missing data, and then compare missing data topologies to the "best" tree topology inferred using the complete matrix. We find that the number of living taxa with morphological characters and the overall number of morphological characters in the matrix, are more important than the amount of missing data in the fossil record for recovering the "best" tree topology. Therefore, we suggest that sampling effort should be focused on morphological data collection for living species to increase the accuracy of topological inference in a Total Evidence framework. Additionally, we find that Bayesian methods consistently outperform other tree inference methods. We therefore recommend using Bayesian consensus trees to fix the tree topology prior to further analyses.

[1]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[2]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[3]  P. Tafforeau,et al.  The oldest known primate skeleton and early haplorhine evolution , 2013, Nature.

[4]  D. Bapst,et al.  A stochastic rate‐calibrated method for time‐scaling phylogenies of fossil taxa , 2013 .

[5]  Tanja Stadler,et al.  Dating phylogenies with sequentially sampled tips. , 2013, Systematic biology.

[6]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[7]  H. Philippe,et al.  Site-specific time heterogeneity of the substitution process and its impact on phylogenetic inference , 2011, BMC Evolutionary Biology.

[8]  J. Wiens,et al.  Hylid frog phylogeny and sampling strategies for speciose clades. , 2005, Systematic biology.

[9]  L. Harmon,et al.  Unifying fossils and phylogenies for comparative analyses of diversification and trait evolution , 2013 .

[10]  Alexandros Stamatakis,et al.  How Many Bootstrap Replicates Are Necessary? , 2009, RECOMB.

[11]  D. Soltis,et al.  Phylogeny of extant and fossil Juglandaceae inferred from the integration of molecular and morphological data sets. , 2007, Systematic biology.

[12]  L. Pauling,et al.  Molecules as documents of evolutionary history. , 1965, Journal of theoretical biology.

[13]  G. Dietl,et al.  Conservation paleobiology: putting the dead to work. , 2011, Trends in ecology & evolution.

[14]  R. Asher,et al.  Phylogeny, paleontology, and primates: do incomplete fossils bias the tree of life? , 2015, Systematic biology.

[15]  David M. Williams,et al.  Congruence Between Molecular and Morphological Phylogenies , 1993 .

[16]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[17]  Seth Kaufman,et al.  MorphoBank: phylophenomics in the “cloud” , 2011, Cladistics : the international journal of the Willi Hennig Society.

[18]  R. Zander Minimal Values for Reliability of Bootstrap and Jackknife Proportions, Decay Index, and Bayesian Posterior Probability , 2004 .

[19]  J. Wiens,et al.  Missing data and the accuracy of Bayesian phylogenetics , 2008 .

[20]  W. Jetz,et al.  The global diversity of birds in space and time , 2012, Nature.

[21]  P. Lewis A likelihood approach to estimating phylogeny from discrete morphological character data. , 2001, Systematic biology.

[22]  Hannah M. Wood,et al.  Treating fossils as terminal taxa in divergence time estimation reveals ancient vicariance patterns in the palpimanoid spiders. , 2013, Systematic biology.

[23]  N. Cooper,et al.  Molecular and Phenotypic Data Support the Recognition of the Wakatobi Flowerpecker (Dicaeum kuehni) from the Unique and Understudied Sulawesi Region , 2014, PloS one.

[24]  A. Kluge,et al.  Taxonomic congruence versus total evidence, and amniote phylogeny inferred from fossils, molecules, and morphology. , 1993, Molecular biology and evolution.

[25]  Wei-Chen Chen,et al.  Overlapping codon model, phylogenetic clustering, and alternative partial expectation conditional maximization algorithm , 2011 .

[26]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[27]  J. Rougemont,et al.  A rapid bootstrap algorithm for the RAxML Web servers. , 2008, Systematic biology.

[28]  Richard E. Lenski,et al.  Long-Term Experimental Evolution in Escherichia coli. XIII. Phylogenetic History of a Balanced Polymorphism , 2005, Journal of Molecular Evolution.

[29]  M. Kuhner,et al.  Practical performance of tree comparison metrics. , 2015, Systematic biology.

[30]  J. Huelsenbeck,et al.  Bayesian phylogenetic analysis of combined data. , 2004, Systematic biology.

[31]  A. Dobson Comparing the shapes of trees , 1975 .

[32]  D. Erwin,et al.  What can we learn about ecology and evolution from the fossil record? , 2006, Trends in ecology & evolution.

[33]  Marc R Spencer,et al.  Efficacy or convenience? Model‐based approaches to phylogeny estimation using morphological data , 2013, Cladistics : the international journal of the Willi Hennig Society.

[34]  Seraina Klopfstein,et al.  A Total-Evidence Approach to Dating with Fossils, Applied to the Early Radiation of the Hymenoptera , 2012, Systematic biology.

[35]  M. Friedman Explosive morphological diversification of spiny-finned teleost fishes in the aftermath of the end-Cretaceous extinction , 2010, Proceedings of the Royal Society B: Biological Sciences.

[36]  R. FitzJohn Diversitree: comparative phylogenetic analyses of diversification in R , 2012 .

[37]  G. Lecointre,et al.  The 'evolutionary signal' of homoplasy in protein-coding gene sequences and its consequences for a priori weighting in phylogeny. , 1998, Comptes rendus de l'Academie des sciences. Serie III, Sciences de la vie.

[38]  John J. Wiens,et al.  Missing data and the design of phylogenetic analyses , 2006, J. Biomed. Informatics.

[39]  E. Simons,et al.  Craniodental Morphology and Systematics of a New Family of Hystricognathous Rodents (Gaudeamuridae) from the Late Eocene and Early Oligocene of Egypt , 2011, PloS one.

[40]  E. Paradis TIME‐DEPENDENT SPECIATION AND EXTINCTION FROM PHYLOGENIES: A LEAST SQUARES APPROACH , 2011, Evolution; international journal of organic evolution.

[41]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[42]  Maureen Kearney,et al.  Fragmentary taxa, missing data, and ambiguity: mistaken assumptions and conclusions. , 2002, Systematic biology.

[43]  T. J. Robinson,et al.  Impacts of the Cretaceous Terrestrial Revolution and KPg Extinction on Mammal Diversification , 2011, Science.

[44]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[45]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[46]  Krzysztof Giaro,et al.  TreeCmp: Comparison of Trees in Polynomial Time , 2012, Evolutionary Bioinformatics Online.

[47]  J. Huelsenbeck,et al.  The fossilized birth–death process for coherent calibration of divergence-time estimates , 2013, Proceedings of the National Academy of Sciences.

[48]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[49]  Douglas E. Critchlow,et al.  THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES , 1996 .

[50]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[51]  Robert S. Sansom,et al.  Fossilization causes organisms to appear erroneously primitive by distorting evolutionary trees , 2013, Scientific Reports.

[52]  Antoine Guisan,et al.  Niche dynamics in space and time. , 2008, Trends in ecology & evolution.

[53]  G. Thomas,et al.  Assessment of cladistic data availability for living mammals , 2015 .

[54]  C. G. Schrago,et al.  Combining fossil and molecular data to date the diversification of New World Primates , 2013, Journal of evolutionary biology.

[55]  M. Pagel Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[56]  W. Doolittle,et al.  Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. , 2003, Molecular biology and evolution.

[57]  Charles R Marshall,et al.  Diversity dynamics: molecular phylogenies need the fossil record. , 2010, Trends in ecology & evolution.

[58]  J. Masters,et al.  Lack of congruence between morphological and molecular data in reconstructing the phylogeny of the galagonidae. , 2002, American journal of physical anthropology.

[59]  Michael S. Y. Lee,et al.  Ancient dates or accelerated rates? Morphological clocks and the antiquity of placental mammals , 2014, Proceedings of the Royal Society B: Biological Sciences.

[60]  Susanne A. Fritz,et al.  Diversity in time and space: wanted dead and alive. , 2013, Trends in ecology & evolution.

[61]  J. S. Rogers,et al.  Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. , 2001, Systematic biology.

[62]  G. Ortí,et al.  An evaluation of fossil tip-dating versus node-age calibrations in tetraodontiform fishes (Teleostei: Percomorphaceae). , 2015, Molecular phylogenetics and evolution.

[63]  A. Pyron,et al.  Divergence time estimation using fossils as terminal taxa and the origins of Lissamphibia. , 2011, Systematic biology.

[64]  Jeremy M. Brown,et al.  The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference , 2009, Systematic biology.

[65]  Thomas Guillerme,et al.  Ecology and mode-of-life explain lifespan variation in birds and mammals , 2014, Proceedings of the Royal Society B: Biological Sciences.

[66]  B. Bremer,et al.  PHYLOGENY OF THE RUBIACEAE AND THE LOGANIACEAE: CONGRUENCE OR CONFLICT BETWEEN MORPHOLOGICAL AND MOLECULAR DATA? , 1992, American journal of botany.

[67]  J. Wiens,et al.  Missing data, incomplete taxa, and phylogenetic accuracy. , 2003, Systematic biology.

[68]  Nicolas Salamin,et al.  Assessing internal support with large phylogenetic DNA matrices. , 2003, Molecular phylogenetics and evolution.

[69]  W. Murphy,et al.  Macroevolutionary Dynamics and Historical Biogeography of Primate Diversification Inferred from a Species Supermatrix , 2012, PloS one.

[70]  G. Slater Phylogenetic evidence for a shift in the mode of mammalian body size evolution at the Cretaceous‐Palaeogene boundary , 2013 .

[71]  April M. Wright,et al.  Bayesian Analysis Using a Simple Likelihood Model Outperforms Parsimony for Estimation of Phylogeny from Discrete Morphological Data , 2014, PloS one.

[72]  Sige Zou,et al.  Deleterious effect of suboptimal diet on rest-activity cycle in Anastrepha ludens manifests itself with age , 2013, Scientific Reports.

[73]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[74]  M. Novacek,et al.  Extinction and phylogeny , 1992 .

[75]  Andrea L. Cirranello,et al.  The Placental Mammal Ancestor and the Post–K-Pg Radiation of Placentals , 2013, Science.

[76]  W. Norton,et al.  Extinction: bad genes or bad luck? , 1991, New scientist.

[77]  Arnaud Estoup,et al.  Homoplasy and mutation model at microsatellite loci and their consequences for population genetics analysis , 2002, Molecular ecology.

[78]  Mike Steel,et al.  Terraces in Phylogenetic Tree Space , 2011, Science.