Phylogenomics of Annelida revisited: a cladistic approach using genome‐wide expressed sequence tag data mining and examining the effects of missing data

We present phylogenomic analyses of the most comprehensive molecular character set compiled for Annelida and its constituent taxa, including over 347 000 aligned nucleotide sites for 39 taxa. The nucleotide data set was recovered using a pre‐existing amino acid data set of almost 48 000 aligned sites as a backbone for tBLASTn searches against NCBI. In addition, orthology determinations of the loci in the original amino acid data set were scrutinized using an All vs All Reciprocal Best Hit approach, employing BLASTp, and examining for statistical interdependency among the loci. This approach revealed considerable sequence redundancy among the loci in the original data set and a new data set was compiled, with the redundancy removed. The newly compiled nucleotide data set, the original amino acid data set, and the new reduced amino acid data set were subjected to parsimony analyses and two forms of bootstrap resampling. The last‐named data set also was analysed using a maximum‐likelihood approach. There were two main objectives to these analyses: (i) to examine the general topology, including support, resulting from the analyses of the new data sets and (ii) to assess the consistency of the branching patterns across optimality criteria by comparison with previous probabilistic approaches. The phylogenetic hypotheses resulting from analyses of the three data sets are largely unsupported, reflecting the continued difficulty of finding numerous, reliable, and suitable loci for a group as ancient as Annelida. Resulting parsimonious hypotheses disagree, in some respects, with the previous probabilistic approaches; Sedentaria and, in most cases, Errantia are not supported as monophyletic groups but Pleistoannelida is recovered as a (unsupported) monophyletic group in one of the three parsimony analyses as well as the likelihood analysis. In addition, we performed missing data titration studies to estimate the impact of missing data on overall support and support for specific clades.

[1]  M. Martindale,et al.  Assessing the root of bilaterian animals with scalable phylogenomic methods , 2009, Proceedings of the Royal Society B: Biological Sciences.

[2]  Antonis Rokas,et al.  Comparing bootstrap and posterior probability values in the four-taxon case. , 2003, Systematic biology.

[3]  G. Wray,et al.  Molecular phylogeny of naidid worms (Annelida: Clitellata) based on cytochrome oxidase I. , 2004, Molecular phylogenetics and evolution.

[4]  Phylogenetic analysis of the aquatic Oligochaeta under the principle of parsimony , 1987 .

[5]  J. Day A monograph on the Polychaeta of southern Africa / By J.H. Day, &c. , 1967 .

[6]  J. Farris,et al.  Homoplasy Increases Phylogenetic Structure , 1999 .

[7]  U. Certa,et al.  Evolution of two distinct phylogenetic lineages of the emerging human pathogen Mycobacterium ulcerans , 2007, BMC Evolutionary Biology.

[8]  J. A. Roubos,et al.  Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88 , 2007, Nature Biotechnology.

[9]  Natascha Hill,et al.  Phylogenomic analyses unravel annelid evolution , 2011, Nature.

[10]  P. Holland,et al.  Phylogenomics of eukaryotes: impact of missing data on large alignments. , 2004, Molecular biology and evolution.

[11]  Stefanie Hartmann,et al.  Using ESTs for phylogenomics: Can one accurately infer a phylogenetic tree from a gappy alignment? , 2008, BMC Evolutionary Biology.

[12]  Pablo A. Goloboff,et al.  TNT, a free program for phylogenetic analysis , 2008 .

[13]  T. Kocher,et al.  Cone opsin genes of african cichlid fishes: tuning spectral sensitivity by differential gene expression. , 2001, Molecular biology and evolution.

[14]  E. Fabbri,et al.  Differential HSP70 gene expression in the Mediterranean mussel exposed to various stressors. , 2005, Biochemical and biophysical research communications.

[15]  H. Hausen,et al.  Polychaete phylogeny based on morphological data – a comparison of current attempts , 2004, Hydrobiologia.

[16]  Masanori Suzuki,et al.  EGassembler: online bioinformatics service for large-scale processing, clustering and assembling ESTs and genomic DNA fragments , 2006, Nucleic Acids Res..

[17]  C. Bleidorn,et al.  Annelid phylogeny and the status of Sipuncula and Echiura , 2007, BMC Evolutionary Biology.

[18]  John J. Wiens,et al.  Missing data and the design of phylogenetic analyses , 2006, J. Biomed. Informatics.

[19]  W. Westheide Progenesis as a principle in meiofauna evolution , 1987 .

[20]  G. Purschke,et al.  Detecting possibly saturated positions in 18S and 28S sequences and their influence on phylogenetic reconstruction of Annelida (Lophotrochozoa). , 2008, Molecular phylogenetics and evolution.

[21]  M. Källersjö,et al.  18S rDNA phylogeny of Clitellata (Annelida) , 2004 .

[22]  T. Struck,et al.  Direction of evolution within Annelida and the definition of Pleistoannelida , 2011 .

[23]  G. Rouse,et al.  Cladistics and polychaetes , 1997 .

[24]  W. Doolittle,et al.  Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. , 2003, Molecular biology and evolution.

[25]  L. Koski,et al.  The Closest BLAST Hit Is Often Not the Nearest Neighbor , 2001, Journal of Molecular Evolution.

[26]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[27]  G. Purschke,et al.  Phylogenetic position of Sipuncula derived from multi‐gene and phylogenomic data and its implication for the evolution of segmentation , 2010 .

[28]  A comparison of phenetic and phylogenetic methods applied to the systematics of Oligochaeta , 2004, Hydrobiologia.

[29]  J. Farris,et al.  Homoplasy Increases Phylogenetic Structure , 1999 .

[30]  C. Erséus Phylogenetic analysis of the aquatic Oligochaeta under the principle of parsimony , 1987, Hydrobiologia.

[31]  C. O. Hermans The Systematic Position of the Archiannelida , 1969 .

[32]  F. Sherman,et al.  DNA sequence required for efficient transcription termination in yeast , 1982, Cell.

[33]  de Winde,et al.  University of Groningen Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88 Pel, , 2006 .

[34]  J. Felsenstein,et al.  How can we infer geography and history from gene frequencies? , 1982, Journal of theoretical biology.

[35]  R. P. Dales THE POLYCHAETE STOMODEUM AND THE INTER‐RELATIONSHIPS OF THE FAMILIES OF POLYCHAETA , 2009 .

[36]  J. Day A Monograph on the Polychaeta of Southern Africa Part 1, Errantia: Part 2, Sedentaria Published by the Trustees of the British Museum (Natural History), London, 1967 Publication no. 656. Pp. viii + 878. Price £15. , 1967, Journal of the Marine Biological Association of the United Kingdom.

[37]  P. Říha,et al.  Phylogeny of Annelida (Lophotrochozoa): total-evidence analysis of morphology and six genes , 2009, BMC Evolutionary Biology.

[38]  David Q. Matus,et al.  Broad phylogenomic sampling improves resolution of the animal tree of life , 2008, Nature.

[39]  Junhyong Kim,et al.  Separate Versus Combined Analysis of Phylogenetic Evidence , 1995 .

[40]  C. Simon,et al.  Polychaetes , 2002 .

[41]  G. Rouse,et al.  Polychaete systematics: Past and present , 1997 .

[42]  Kristian Fauchald,et al.  The Polychaete Worms: Definitions and Keys to the Orders, Families and Genera , 1977 .

[43]  Todd H. Oakley,et al.  Phylotranscriptomics to bring the understudied into the fold: monophyletic ostracoda, fossil placement, and pancrustacean phylogeny. , 2013, Molecular biology and evolution.

[44]  D. McHugh Molecular phylogeny of the Annelida , 2000 .

[45]  J. Wiens,et al.  Missing data, incomplete taxa, and phylogenetic accuracy. , 2003, Systematic biology.

[46]  G. Rouse,et al.  A molecular phylogeny of annelids , 2007, Cladistics : the international journal of the Willi Hennig Society.

[47]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[48]  Zhi-Qiang Zhang,et al.  Animal biodiversity: An introduction to higher-level classification and taxonomic richness , 2011 .

[49]  J. Farris,et al.  Parsimony, Synapomorphy, and Explanatory Power: A Reply to Duncan , 1985 .

[50]  Christer Erséus Phylogeny of oligochaetous Clitellata , 2005, Hydrobiologia.

[51]  M. Källersjö,et al.  Seed Plant Relationships and the Systematic Position of Gnetales Based on Nuclear and Chloroplast DNA: Conflicting Data, Rooting Problems, and the Monophyly of Conifers , 2002, International Journal of Plant Sciences.

[52]  Masatoshi Nei,et al.  Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Kousaku Okubo,et al.  Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression , 1992, Nature Genetics.

[54]  F. Lutzoni,et al.  Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. , 2003, Molecular biology and evolution.

[55]  Yoav Freund,et al.  Identifying metabolic enzymes with multiple types of association evidence , 2006, BMC Bioinformatics.

[56]  D. McHugh,et al.  Molecular evidence that echiurans and pogonophorans are derived annelids. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[57]  D. Wheeler,et al.  Differential gene expression between developing queens and workers in the honey bee, Apis mellifera. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[58]  C. Cunningham,et al.  Can three incongruence tests predict when data should be combined? , 1997, Molecular biology and evolution.

[59]  K. Holsinger,et al.  Polytomies and Bayesian phylogenetic inference. , 2005, Systematic biology.

[60]  M. Källersjö,et al.  Validating Livanow: molecular data agree that leeches, Branchiobdellidans, and Acanthobdella peledina form a monophyletic group of oligochaetes. , 2001, Molecular phylogenetics and evolution.

[61]  M. Siddall Unringing a bell: metazoan phylogenomics and the partition bootstrap , 2009, Cladistics : the international journal of the Willi Hennig Society.

[62]  Bianca Habermann,et al.  From biomedicine to natural history research: EST resources for ambystomatid salamanders , 2004, BMC Genomics.

[63]  Joseph Felsenstein,et al.  Parsimony in Systematics: Biological and Statistical Issues , 1983 .

[64]  Greg Rouse,et al.  Systematization of the Annelida: different approaches , 1999, Hydrobiologia.

[65]  H. Philippe,et al.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. , 2004, Molecular biology and evolution.

[66]  D. Soltis,et al.  DNA sequences from Miocene fossils: an ndhF sequence of Magnolia latahensis (Magnoliaceae) and an rbcL sequence of Persea pseudocarolinensis (Lauraceae). , 2004, American journal of botany.

[67]  J. Farris The Logical Basis of Phylogenetic Analysis , 2004 .

[68]  Junhyong Kim,et al.  The Cobweb of Life Revealed by Genome-Scale Estimates of Horizontal Gene Transfer , 2005, PLoS biology.

[69]  Lars Vogt,et al.  New insights into polychaete phylogeny (Annelida) inferred from 18S rDNA sequences. , 2003, Molecular phylogenetics and evolution.

[70]  Mark Gerstein,et al.  Getting Started in Gene Orthology and Functional Analysis , 2010, PLoS Comput. Biol..

[71]  H. Philippe,et al.  Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough , 2011, PLoS biology.