Missing data, incomplete taxa, and phylogenetic accuracy.

The problem of missing data is often considered to be the most important obstacle in reconstructing the phylogeny of fossil taxa and in combining data from diverse characters and taxa for phylogenetic analysis. Empirical and theoretical studies show that including highly incomplete taxa can lead to multiple equally parsimonious trees, poorly resolved consensus trees, and decreased phylogenetic accuracy. However, the mechanisms that cause incomplete taxa to be problematic have remained unclear. It has been widely assumed that incomplete taxa are problematic because of the proportion or amount of missing data that they bear. In this study, I use simulations to show that the reduced accuracy associated with including incomplete taxa is caused by these taxa bearing too few complete characters rather than too many missing data cells. This seemingly subtle distinction has a number of important implications. First, the so-called missing data problem for incomplete taxa is, paradoxically, not directly related to their amount or proportion of missing data. Thus, the level of completeness alone should not guide the exclusion of taxa (contrary to common practice), and these results may explain why empirical studies have sometimes found little relationship between the completeness of a taxon and its impact on an analysis. These results also (1) suggest a more effective strategy for dealing with incomplete taxa, (2) call into question a justification of the controversial phylogenetic supertree approach, and (3) show the potential for the accurate phylogenetic placement of highly incomplete taxa, both when combining diverse data sets and when analyzing relationships of fossil taxa.

[1]  J. Wiens,et al.  INCOMPLETE TAXA, INCOMPLETE CHARACTERS, AND PHYLOGENETIC ACCURACY: IS THERE A MISSING DATA PROBLEM? , 2003 .

[2]  J. Ohn,et al.  Does Adding Characters with Missing Data Increase or Decrease Phylogenetic Accuracy ? , 2003 .

[3]  Rob DeSalle,et al.  Resolution of a supertree/supermatrix paradox. , 2002, Systematic biology.

[4]  Maureen Kearney,et al.  Fragmentary taxa, missing data, and ambiguity: mistaken assumptions and conclusions. , 2002, Systematic biology.

[5]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[6]  J. Wiens Character analysis in morphological phylogenetics: problems and solutions. , 2001, Systematic biology.

[7]  M J Sanderson,et al.  Assessment of the accuracy of matrix representation with parsimony analysis supertree construction. , 2001, Systematic biology.

[8]  Michael M. Miyamoto,et al.  Molecular and Morphological Supertrees for Eutherian (Placental) Mammals , 2001, Science.

[9]  J S Anderson,et al.  The phylogenetic trunk: maximal inclusion of taxa with missing data in an analysis of the lepospondyli (Vertebrata, Tetrapoda). , 2001, Systematic biology.

[10]  M. Ebach,et al.  Phylogeny of the Trilobite Subgenus Acanthopyge (Lobopyge) , 2001 .

[11]  ICHAEL,et al.  Assessment of the Accuracy of Matrix Representation with Parsimony Analysis Supertree Construction , 2001 .

[12]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[13]  B. Rannala,et al.  Taxon sampling and the accuracy of large phylogenies. , 1998, Systematic biology.

[14]  L. Grande,et al.  A comprehensive phylogenetic study of amiid fishes (Amiidae) based on comparative skeletal anatomy : an empirical search for interconnected patterns of natural history , 1998 .

[15]  Andy Purvis,et al.  Phylogenetic supertrees: Assembling the trees of life. , 1998, Trends in ecology & evolution.

[16]  M. Norell,et al.  Taxonomic revision of Carusia (Reptilia, Squamata) from the late Cretaceous of the Gobi Desert and phylogenetic relationships of anguimorphan lizards. American Museum novitates ; no. 3230 , 1998 .

[17]  Mark Wilkinson,et al.  Coping with Abundant Missing Entries in Phylogenetic Inference Using Parsimony , 1995 .

[18]  J. Wiens,et al.  Combining data sets with different numbers of taxa for phylogenetic analysis , 1995 .

[19]  M. Benton,et al.  Missing data and rhynchosaur phylogeny , 1995 .

[20]  D. Hillis Approaches for Assessing Phylogenetic Accuracy , 1995 .

[21]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .

[22]  M. Novacek Fossils, Topologies, Missing Data, and the Higher Level Phylogeny of Eutherian Mammals , 1992 .

[23]  John P. Huelsenbeck,et al.  WHEN ARE FOSSILS BETTER THAN EXTANT TAXA IN PHYLOGENETIC ANALYSIS , 1991 .

[24]  M. Donoghue,et al.  The Importance of Fossils in Phylogeny Reconstruction , 1989 .

[25]  Timothy B. Rowe,et al.  Definition, diagnosis, and origin of Mammalia , 1988 .

[26]  Arnold G. Kluge,et al.  AMNIOTE PHYLOGENY AND THE IMPORTANCE OF FOSSILS , 1988, Cladistics : the international journal of the Willi Hennig Society.

[27]  Peter Ax,et al.  The phylogenetic system : the systematization of organisms on the basis of their phylogenesis , 1987 .

[28]  J. Gauthier Saurischian monophyly and the origin of birds , 1986 .

[29]  D. Penny,et al.  The Use of Tree Comparison Metrics , 1985 .

[30]  C Patterson,et al.  Significance of Fossils in Determining Evolutionary Relationships , 1981 .

[31]  S. Jeffery Evolution of Protein Molecules , 1979 .

[32]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .