Resolving arthropod phylogeny: exploring phylogenetic signal within 41 kb of protein-coding nuclear gene sequence.

This study attempts to resolve relationships among and within the four basal arthropod lineages (Pancrustacea, Myriapoda, Euchelicerata, Pycnogonida) and to assess the widespread expectation that remaining phylogenetic problems will yield to increasing amounts of sequence data. Sixty-eight regions of 62 protein-coding nuclear genes (approximately 41 kilobases (kb)/taxon) were sequenced for 12 taxonomically diverse arthropod taxa and a tardigrade outgroup. Parsimony, likelihood, and Bayesian analyses of total nucleotide data generally strongly supported the monophyly of each of the basal lineages represented by more than one species. Other relationships within the Arthropoda were also supported, with support levels depending on method of analysis and inclusion/exclusion of synonymous changes. Removing third codon positions, where the assumption of base compositional homogeneity was rejected, altered the results. Removing the final class of synonymous mutations--first codon positions encoding leucine and arginine, which were also compositionally heterogeneous--yielded a data set that was consistent with a hypothesis of base compositional homogeneity. Furthermore, under such a data-exclusion regime, all 68 gene regions individually were consistent with base compositional homogeneity. Restricting likelihood analyses to nonsynonymous change recovered trees with strong support for the basal lineages but not for other groups that were variably supported with more inclusive data sets. In a further effort to increase phylogenetic signal, three types of data exploration were undertaken. (1) Individual genes were ranked by their average rate of nonsynonymous change, and three rate categories were assigned--fast, intermediate, and slow. Then, bootstrap analysis of each gene was performed separately to see which taxonomic groups received strong support. Five taxonomic groups were strongly supported independently by two or more genes, and these genes mostly belonged to the slow or intermediate categories, whereas groups supported only by a single gene region tended to be from genes of the fast category, arguing that fast genes provide a less consistent signal. (2) A sensitivity analysis was performed in which increasing numbers of genes were excluded, beginning with the fastest. The number of strongly supported nodes increased up to a point and then decreased slightly. Recovery of Hexapoda required removal of fast genes. Support for Mandibulata (Pancrustacea + Myriapoda) also increased, at times to "strong" levels, with removal of the fastest genes. (3) Concordance selection was evaluated by clustering genes according to their ability to recover Pancrustacea, Euchelicerata, or Myriapoda and analyzing the three clusters separately. All clusters of genes recovered the three concordance clades but were at times inconsistent in the relationships recovered among and within these clades, a result that indicates that the a priori concordance criteria may bias phylogenetic signal in unexpected ways. In a further attempt to increase support of taxonomic relationships, sequence data from 49 additional taxa for three slow genes (i.e., EF-1 alpha, EF-2, and Pol II) were combined with the various 13-taxon data sets. The 62-taxon analyses supported the results of the 13-taxon analyses and provided increased support for additional pancrustacean clades found in an earlier analysis including only EF-1 alpha, EF-2, and Pol II.

[1]  Masami Hasegawa,et al.  Root of the Eukaryota tree as inferred from combined maximum likelihood analyses of multiple molecular sequence data. , 2005, Molecular biology and evolution.

[2]  James C. Wilgenbusch,et al.  AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics , 2008, Bioinform..

[3]  Stephen A. Krawetz,et al.  Bioinformatics Methods and Protocols , 1999 .

[4]  Denis Trystram,et al.  Multiple Sequence Alignment and Phylogenetic Inference , 2007, Grid Computing for Bioinformatics and Computational Biology.

[5]  J. Shultz,et al.  Ecdysozoan phylogeny and Bayesian inference: first use of nearly complete 28S and 18S rRNA gene sequences to classify the arthropods and their kin. , 2004, Molecular phylogenetics and evolution.

[6]  H. Philippe,et al.  Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model , 2007, BMC Evolutionary Biology.

[7]  Peter Beerli,et al.  Bayesian inference , 2005 .

[8]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[9]  Jaa Nylander,et al.  MrModeltest 2.2. Program Distributed by the Author , 2004 .

[10]  Zih E N G Ya N,et al.  On the Best Evolutionary Rate for Phylogenetic Analysis , 1998 .

[11]  K. Holsinger,et al.  Polytomies and Bayesian phylogenetic inference. , 2005, Systematic biology.

[12]  G. Weinstock,et al.  Phylogenomic analysis reveals bees and wasps (Hymenoptera) at the base of the radiation of Holometabolous insects. , 2006, Genome research.

[13]  G. Edgecombe,et al.  The position of crustaceans within Arthropoda - Evidence from nine molecular loci and morphology , 2010 .

[14]  Diana J. Kao,et al.  Parallel adaptive radiations in two major clades of placental mammals , 2001, Nature.

[15]  Jeffrey L. Boore,et al.  Gene translocation links insects and crustaceans , 1998, Nature.

[16]  Daniel S. Myers,et al.  Grid Services Base Library: A high-level, procedural application programming interface for writing Globus-based Grid services , 2007, Future Gener. Comput. Syst..

[17]  D. Maddison,et al.  MacClade 4: analysis of phy-logeny and character evolution , 2003 .

[18]  J. Bull,et al.  An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis , 1993 .

[19]  A. Braband,et al.  The complete mitochondrial genome of the onychophoran Epiperipatus biolleyi reveals a unique transfer RNA set and provides further support for the ecdysozoa hypothesis. , 2007, Molecular biology and evolution.

[20]  Kathryn F. Beal,et al.  The Staden package, 1998. , 2000, Methods in molecular biology.

[21]  Naiara Rodríguez-Ezpeleta,et al.  Detecting and overcoming systematic errors in genome-scale phylogenies. , 2007, Systematic biology.

[22]  S. O’Brien,et al.  Molecular phylogenetics and the origins of placental mammals , 2001, Nature.

[23]  Avin,et al.  Amphioxus Mitochondrial DNA , Chordate Phylogeny , and the Limits of Inference Based on Comparisons of Sequences , 2003 .

[24]  M. Pagel,et al.  A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. , 2004, Systematic biology.

[25]  Mark T. Holder,et al.  The Posterior and the Prior in Bayesian Phylogenetics , 2006 .

[26]  R. Jenner,et al.  Crustacea and Arthropod Relationships , 2005 .

[27]  Martin J. Lercher,et al.  the base of the radiation of Holometabolous insects Phylogenomic analysis reveals bees and wasps (Hymenoptera) at , 2006 .

[28]  Timothy M. Collins,et al.  Deducing the pattern of arthropod phytogeny from mitochondrial DNA rearrangements , 1995, Nature.

[29]  Mark P. Simmons,et al.  Relative character-state space, amount of potential phylogenetic information, and heterogeneity of nucleotide and amino acid characters. , 2004, Molecular phylogenetics and evolution.

[30]  M. Steel,et al.  Recovering evolutionary trees under a more realistic model of sequence evolution. , 1994, Molecular biology and evolution.

[31]  D. Tautz,et al.  Ribosomal DNA phylogeny of the major extant arthropod classes and the evolution of myriapods , 1995, Nature.

[32]  J. Shultz,et al.  Pancrustacean phylogeny: hexapods are terrestrial crustaceans and maxillopods are not monophyletic , 2005, Proceedings of the Royal Society B: Biological Sciences.

[33]  J. C. Regier,et al.  Nuclear gene sequences for higher level phylogenetic analysis: 14 promising candidates , 1992 .

[34]  Melanie A. Huntley,et al.  Evolution of genes and genomes on the Drosophila phylogeny , 2007, Nature.

[35]  J. C. Regier,et al.  Increased yield of PCR product from degenerate primers with nondegenerate, nonhomologous 5' tails. , 2005, BioTechniques.

[36]  Agostinho Antunes,et al.  The Late Miocene Radiation of Modern Felidae: A Genetic Assessment , 2006, Science.

[37]  R. Holliday How Many Genes , 2007 .

[38]  Daniel S. Myers,et al.  Expanding the Reach of Grid Computing: Combining Globus- and BOINC-Based Systems , 2007, Grid Computing for Bioinformatics and Computational Biology.

[39]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[40]  J. Shultz,et al.  Robust support for tardigrade clades and their ages from three protein-coding nuclear genes , 2005 .

[41]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[42]  M. Telford Phylogenomics , 2007, Current Biology.

[43]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[44]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[45]  Ross A. Overbeek,et al.  The genetic data environment an expandable GUI for multiple sequence analysis , 1994, Comput. Appl. Biosci..

[46]  A. Graybeal Evaluating the Phylogenetic Utility of Genes: A Search for Genes Informative About Deep Divergences among Vertebrates , 1994 .

[47]  Arthropod Relationships , 1998, The Systematics Association Special Volume Series.

[48]  J. C. Regier,et al.  More taxa or more characters revisited: combining data from nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea (Insecta: Lepidoptera). , 2000, Systematic biology.

[49]  Nicolas Lartillot,et al.  A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution. , 2006, Molecular biology and evolution.

[50]  Daniel S. Myers,et al.  Necessity is the mother of invention: a simple grid computing system using commodity tools , 2003, J. Parallel Distributed Comput..

[51]  J. Shultz,et al.  Phylogenetic analysis of Myriapoda using three nuclear protein-coding genes. , 2005, Molecular phylogenetics and evolution.

[52]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[53]  D. Tautz,et al.  Mitochondrial protein phylogeny joins myriapods with chelicerates , 2001, Nature.

[54]  J. Shultz,et al.  Elongation factor-2: a useful gene for arthropod phylogenetics. , 2001, Molecular phylogenetics and evolution.

[55]  A. Braband,et al.  The complete mitochondrial genome of the sea spider Nymphon gracile (Arthropoda: Pycnogonida) , 2006, BMC Genomics.

[56]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[57]  P. Lockhart,et al.  Deciphering ancient rapid radiations. , 2007, Trends in ecology & evolution.

[58]  Y. Esa,et al.  Phylogenetic Analysis of Hampala Fishes (Subfamily Cyprininae) in Malaysia Inferred from Partial Mitochondrial Cytochrome b DNA Sequences , 2006, Zoological science.

[59]  J. Mallatt,et al.  Further use of nearly complete 28S and 18S rRNA genes to classify Ecdysozoa: 37 more arthropods and a kinorhynch. , 2006, Molecular phylogenetics and evolution.

[60]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[61]  Sonia Sharama,et al.  Grid Computing , 2004, Lecture Notes in Computer Science.

[62]  John J. Wiens,et al.  Missing data and the design of phylogenetic analyses , 2006, J. Biomed. Informatics.

[63]  A. Chicaro,et al.  Animal Evolution and the Molecular Signature of Radiations Compressed in Time , 2005 .

[64]  J. McInerney,et al.  The Opisthokonta and the Ecdysozoa may not be clades: stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa. , 2005, Molecular biology and evolution.

[65]  Antonis Rokas,et al.  Comparing bootstrap and posterior probability values in the four-taxon case. , 2003, Systematic biology.

[66]  George E. Davis,et al.  An Updated Classification Of The Recent Crustacea , 2001 .

[67]  E. Willerslev,et al.  The Origin of Insects , 2006, Science.

[68]  Rob DeSalle,et al.  How many genes should a systematist sample? Conflicting insights from a phylogenomic matrix characterized by replicated incongruence. , 2007, Systematic biology.

[69]  J. Shultz,et al.  A phylogenetic analysis of Myriapoda (Arthropoda) using two nuclear protein-encoding genes , 2001 .

[70]  Ziheng Yang,et al.  Branch-length prior influences Bayesian posterior probability of phylogeny. , 2005, Systematic biology.

[71]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[72]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[73]  ohn,et al.  Potential Applications and Pitfalls of Bayesian Inference of Phylogeny , 2002 .