DNA Barcodes Combined with Multilocus Data of Representative Taxa Can Generate Reliable Higher-Level Phylogenies

Abstract Taxa are frequently labeled incertae sedis when their placement is debated at ranks above the species level, such as their subgeneric, generic, or subtribal placement. This is a pervasive problem in groups with complex systematics due to difficulties in identifying suitable synapomorphies. In this study, we propose combining DNA barcodes with a multilocus backbone phylogeny in order to assign taxa to genus or other higher-level categories. This sampling strategy generates molecular matrices containing large amounts of missing data that are not distributed randomly: barcodes are sampled for all representatives, and additional markers are sampled only for a small percentage. We investigate the effects of the degree and randomness of missing data on phylogenetic accuracy using simulations for up to 100 markers in 1000-tips trees, as well as a real case: the subtribe Polyommatina (Lepidoptera: Lycaenidae), a large group including numerous species with unresolved taxonomy. Our simulation tests show that when a strategic and representative selection of species for higher-level categories has been made for multigene sequencing (approximately one per simulated genus), the addition of this multigene backbone DNA data for as few as 5–10% of the specimens in the total data set can produce high-quality phylogenies, comparable to those resulting from 100% multigene sampling. In contrast, trees based exclusively on barcodes performed poorly. This approach was applied to a 1365-specimen data set of Polyommatina (including ca. 80% of described species), with nearly 8% of representative species included in the multigene backbone and the remaining 92% included only by mitochondrial COI barcodes, a phylogeny was generated that highlighted potential misplacements, unrecognized major clades, and placement for incertae sedis taxa. We use this information to make systematic rearrangements within Polyommatina, and to describe two new genera. Finally, we propose a systematic workflow to assess higher-level taxonomy in hyperdiverse groups. This research identifies an additional, enhanced value of DNA barcodes for improvements in higher-level systematics using large data sets. [Birabiro; DNA barcoding; incertae sedis; Kipepeo; Lycaenidae; missing data; phylogenomic; phylogeny; Polyommatina; supermatrix; systematics; taxonomy]

[1]  Olga Chernomor,et al.  IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era , 2019, bioRxiv.

[2]  R. Vilà,et al.  Two consecutive Wolbachia‐mediated mitochondrial introgressions obscure taxonomy in Palearctic swallowtail butterflies (Lepidoptera, Papilionidae) , 2019, Zoologica Scripta.

[3]  J. Coddington,et al.  Golden Orbweavers Ignore Biological Rules: Phylogenomic and Comparative Analyses Unravel a Complex Evolution of Sexual Size Dimorphism , 2018, bioRxiv.

[4]  M. Suchard,et al.  Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7 , 2018, Systematic biology.

[5]  A. von Haeseler,et al.  UFBoot2: Improving the Ultrafast Bootstrap Approximation , 2017, bioRxiv.

[6]  Thomas K. F. Wong,et al.  ModelFinder: Fast Model Selection for Accurate Phylogenetic Estimates , 2017, Nature Methods.

[7]  Frédéric Delsuc,et al.  Pitfalls in supermatrix phylogenomics , 2017 .

[8]  Robert Lanfear,et al.  PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses. , 2016, Molecular biology and evolution.

[9]  Scott E Miller,et al.  Advancing taxonomy and bioinventories with DNA barcodes , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[10]  Pierre Taberlet,et al.  The ecologist's field guide to sequence‐based identification of biodiversity , 2016 .

[11]  R. Vilà,et al.  Integrative analyses unveil speciation linked to host plant shift in Spialia butterflies , 2016, Molecular ecology.

[12]  J. Coddington,et al.  DNA barcode data accurately assign higher spider taxa , 2016, PeerJ.

[13]  V. Lukhtanov,et al.  DNA barcodes as a tool in biodiversity research: testing pre-existing taxonomic hypotheses in Delphic Apollo butterflies (Lepidoptera, Papilionidae) , 2016 .

[14]  R. Vilà,et al.  One‐note samba: the biogeographical history of the relict Brazilian butterfly Elkalyce cogina , 2016 .

[15]  Brian C. O'Meara,et al.  MonoPhy: A simple R package to find and visualize monophyly issues , 2016, PeerJ Prepr..

[16]  R. Vilà,et al.  DNA barcode reference library for Iberian butterflies enables a continental-scale preview of potential cryptic diversity , 2015, Scientific Reports.

[17]  J. Wiens,et al.  Do missing data influence the accuracy of divergence-time estimation with BEAST? , 2015, Molecular phylogenetics and evolution.

[18]  Hong Wang,et al.  Should genes with missing data be excluded from phylogenetic analyses? , 2014, Molecular phylogenetics and evolution.

[19]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[20]  R. Vilà,et al.  Factors affecting species delimitations with the GMYC model: insights from a butterfly survey , 2013 .

[21]  L. Kubatko,et al.  Effects of missing data on species tree estimation under the coalescent. , 2013, Molecular phylogenetics and evolution.

[22]  Karimov Abdusamat Ismonovich,et al.  Effect of γ , 2013 .

[23]  R. Vilà,et al.  Establishing criteria for higher‐level classification using molecular data: the systematics of Polyommatus blue butterflies (Lepidoptera, Lycaenidae) , 2013 .

[24]  D. Penny,et al.  Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice , 2013, Genome biology and evolution.

[25]  J. Wiens,et al.  Highly Incomplete Taxa Can Rescue Phylogenetic Analyses from the Negative Impacts of Limited Taxon Sampling , 2012, PloS one.

[26]  Ramón Doallo,et al.  CircadiOmics: integrating circadian genomics, transcriptomics, proteomics and metabolomics , 2012, Nature Methods.

[27]  Mark P. Simmons,et al.  Misleading results of likelihood‐based phylogenetic analyses in the presence of missing data , 2012, Cladistics : the international journal of the Willi Hennig Society.

[28]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[29]  J. Wiens,et al.  Missing data in phylogenetic analysis: reconciling results from simulations and empirical data. , 2011, Systematic biology.

[30]  Tanja Stadler,et al.  Simulating trees with a fixed number of extant species. , 2011, Systematic biology.

[31]  John-James Wilson Assessing the Value of DNA Barcodes for Molecular Phylogenetics: Effect of Increased Taxon Sampling in Lepidoptera , 2011, PloS one.

[32]  Cynthia Parr,et al.  Can Deliberately Incomplete Gene Sample Augmentation Improve a Phylogeny Estimate for the Advanced Moths and Butterflies (Hexapoda: Lepidoptera)? , 2011, Systematic biology.

[33]  D. Janzen,et al.  When species matches are unavailable are DNA barcodes correctly assigned to higher taxa? An assessment using sphingid moths , 2011, BMC Ecology.

[34]  M. P. Cummings,et al.  Increased gene sampling strengthens support for higher-level groups within leaf-mining moths and relatives (Lepidoptera: Gracillariidae) , 2011, BMC Evolutionary Biology.

[35]  H. Philippe,et al.  Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough , 2011, PLoS biology.

[36]  Kurt E. Johnson,et al.  Phylogeny and palaeoecology of Polyommatus blue butterflies show Beringia was a climate-regulated gateway to the New World , 2011, Proceedings of the Royal Society B: Biological Sciences.

[37]  M. P. Cummings,et al.  Increased gene sampling yields robust support for higher‐level clades within Bombycoidea (Lepidoptera) , 2011 .

[38]  B. Stradomsky,et al.  A molecular phylogeny of Polyommatus s. str. and Plebicula based on mitochondrial COI and nuclear ITS2 sequences (Lepidoptera: Lycaenidae) , 2010 .

[39]  Jeremy M. Brown,et al.  The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference , 2009, Systematic biology.

[40]  N. Galtier,et al.  Dealing with incongruence in phylogenomic analyses , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[41]  David Q. Matus,et al.  Broad phylogenomic sampling improves resolution of the animal tree of life , 2008, Nature.

[42]  J. Pelham A Catalogue of the Butterflies of the United States and Canada with a complete bibliography of the descriptive and systematic literature , 2008, The Journal of Research on the Lepidoptera.

[43]  Víctor Soria-Carrasco,et al.  The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees , 2007, Bioinform..

[44]  S. Miller DNA barcoding and the renaissance of taxonomy , 2007, Proceedings of the National Academy of Sciences.

[45]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[46]  F. Delsuc,et al.  Phylogenomics: the beginning of incongruence? , 2006, Trends in genetics : TIG.

[47]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[48]  John J. Wiens,et al.  Missing data and the design of phylogenetic analyses , 2006, J. Biomed. Informatics.

[49]  D. Rubinoff,et al.  Between two extremes: mitochondrial DNA is neither the panacea nor the nemesis of phylogenetic and taxonomic inference. , 2005, Systematic biology.

[50]  Rob DeSalle,et al.  The unholy trinity: taxonomy, species delimitation and DNA barcoding , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[51]  J. Wiens,et al.  Hylid frog phylogeny and sampling strategies for speciose clades. , 2005, Systematic biology.

[52]  Q. Wheeler,et al.  The perils of DNA barcoding and the need for integrative taxonomy. , 2005, Systematic biology.

[53]  J. Wiens Can incomplete taxa rescue phylogenetic analyses from long-branch attraction? , 2005, Systematic biology.

[54]  P. Holland,et al.  Phylogenomics of eukaryotes: impact of missing data on large alignments. , 2004, Molecular biology and evolution.

[55]  N. Pierce,et al.  Phylogeny of Agrodiaetus Hübner 1822 (Lepidoptera: Lycaenidae) inferred from mtDNA sequences of COI and COII and nuclear sequences of EF1-alpha: karyotype diversification and species radiation. , 2004, Systematic biology.

[56]  Jody Hey,et al.  Understanding and confronting species uncertainty in biology and conservation , 2003 .

[57]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[58]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[59]  J. Wiens,et al.  Missing data, incomplete taxa, and phylogenetic accuracy. , 2003, Systematic biology.

[60]  S. Poe Evaluation of the strategy of long-branch subdivision to improve the accuracy of phylogenetic methods. , 2003, Systematic biology.

[61]  Jeremy R. deWaard,et al.  Biological identifications through DNA barcodes , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[62]  Derrick J. Zwickl,et al.  Increased taxon sampling greatly reduces phylogenetic error. , 2002, Systematic biology.

[63]  Terry Gaasterland,et al.  The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Sudhir Kumar,et al.  Incomplete taxon sampling is not a problem for phylogenetic inference , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[65]  R. Nichols,et al.  Gene trees and species trees are not the same. , 2001, Trends in ecology & evolution.

[66]  B. Rannala,et al.  Taxon sampling and the accuracy of large phylogenies. , 1998, Systematic biology.

[67]  A. Graybeal,et al.  Is it better to add taxa or characters to a difficult phylogenetic problem? , 1998, Systematic biology.

[68]  W. Maddison Gene Trees in Species Trees , 1997 .

[69]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[70]  E. G. Strauss,et al.  Molecular phylogeny , 1992, Current Biology.

[71]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[72]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[73]  Matthew W. Hahn,et al.  Why Concatenation Fails Near the Anomaly Zone , 2018, Systematic biology.

[74]  J. Wiens,et al.  How Should Genes and Taxa be Sampled for Phylogenomic Analyses with Missing Data? An Empirical Study in Iguanian Lizards. , 2016, Systematic biology.

[75]  Sbordoni,et al.  GUIDE TO THE BUTTERFLIES OF THE PALEARCTIC REGION. Pieridae part III (Subfamily Coliadinae, Tribe Rhodocerini, genera Gonepteryx and others, subfamily Dismorpiinae genus Leptidea). , 2016 .

[76]  V. Lukhtanov,et al.  On the Generic Position of Polyommatus avinovi (Lepidoptera: Lycaenidae). , 2016, Folia biologica.

[77]  B. Stradomsky A molecular phylogeny of the subfamily Polyommatinae (Lepidoptera: Lycaenidae) , 2016 .

[78]  J. Wiens,et al.  Combining phylogenomic and supermatrix approaches, and a time-calibrated phylogeny for squamate reptiles (lizards and snakes) based on 52 genes and 4162 species. , 2016, Molecular phylogenetics and evolution.

[79]  R. Hanner,et al.  DNA Barcoding, species delineation and taxonomy: a historical perspective , 2015 .

[80]  H. Philippe,et al.  Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. , 2013, Molecular biology and evolution.

[81]  M. Simmons Radical instability and spurious branch support by likelihood when applied to matrices with non-random distributions of missing data. , 2012, Molecular phylogenetics and evolution.

[82]  J. Gatesy,et al.  The supermatrix approach to systematics. , 2007, Trends in ecology & evolution.

[83]  P. Hall,et al.  The butterflies of Canada , 1998 .

[84]  M. Nei,et al.  Relationships between Gene Trees and Species Trees1 , 1998 .