DNA Barcoding of Recently Diverged Species: Relative Performance of Matching Methods

Recently diverged species are challenging for identification, yet they are frequently of special interest scientifically as well as from a regulatory perspective. DNA barcoding has proven instrumental in species identification, especially in insects and vertebrates, but for the identification of recently diverged species it has been reported to be problematic in some cases. Problems are mostly due to incomplete lineage sorting or simply lack of a ‘barcode gap’ and probably related to large effective population size and/or low mutation rate. Our objective was to compare six methods in their ability to correctly identify recently diverged species with DNA barcodes: neighbor joining and parsimony (both tree-based), nearest neighbor and BLAST (similarity-based), and the diagnostic methods DNA-BAR, and BLOG. We analyzed simulated data assuming three different effective population sizes as well as three selected empirical data sets from published studies. Results show, as expected, that success rates are significantly lower for recently diverged species (∼75%) than for older species (∼97%) (P<0.00001). Similarity-based and diagnostic methods significantly outperform tree-based methods, when applied to simulated DNA barcode data (P<0.00001). The diagnostic method BLOG had highest correct query identification rate based on simulated (86.2%) as well as empirical data (93.1%), indicating that it is a consistently better method overall. Another advantage of BLOG is that it offers species-level information that can be used outside the realm of DNA barcoding, for instance in species description or molecular detection assays. Even though we can confirm that identification success based on DNA barcoding is generally high in our data, recently diverged species remain difficult to identify. Nevertheless, our results contribute to improved solutions for their accurate identification.

[1]  D. Posada jModelTest: phylogenetic model averaging. , 2008, Molecular biology and evolution.

[2]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[3]  C. Bonferroni Il calcolo delle assicurazioni su gruppi di teste , 1935 .

[4]  Xiaodong Zheng,et al.  Comparing the Usefulness of Distance, Monophyly and Character-Based DNA Barcoding Methods in Species Identification: A Case Study of Neogastropoda , 2011, PloS one.

[5]  Pablo A. Goloboff,et al.  TNT, a free program for phylogenetic analysis , 2008 .

[6]  S. Ball,et al.  DNA barcodes for biosecurity: invasive species identification , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[7]  E. Pante,et al.  Limitations of mitochondrial gene barcoding in Octocorallia , 2011, Molecular ecology resources.

[8]  Vladimir Pavlovic,et al.  Efficient alignment-free DNA barcode analytics , 2009, BMC Bioinformatics.

[9]  Damon P. Little,et al.  Choosing and Using a Plant DNA Barcode , 2011, PloS one.

[10]  R. Meier,et al.  The use of mean instead of smallest interspecific distances exaggerates the size of the "barcoding gap" and leads to misidentification. , 2008, Systematic biology.

[11]  Indra Neil Sarkar,et al.  caos software for use in character‐based DNA barcoding , 2008, Molecular ecology resources.

[12]  C. Meyer,et al.  The Controversy , 2022 .

[13]  F. Wilcoxon,et al.  Probability tables for individual comparisons by ranking methods. , 1947, Biometrics.

[14]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[15]  Jake Y. Chen,et al.  Biological Data Mining , 2009 .

[16]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[17]  Sylvie Duthoit,et al.  DNA barcoding the floras of biodiversity hotspots , 2008, Proceedings of the National Academy of Sciences.

[18]  R DeSalle,et al.  Character-based DNA barcoding allows discrimination of genera, species and populations in Odonata , 2007, Proceedings of the Royal Society B: Biological Sciences.

[19]  Kishori M. Konwar,et al.  DNA-BAR: distinguisher selection for DNA barcoding , 2005, Bioinform..

[20]  Indra Neil Sarkar,et al.  The Barcode of Life Data Portal: Bridging the Biodiversity Informatics Divide for DNA Barcoding , 2011, PloS one.

[21]  J. David,et al.  DNA barcode discovers two cryptic species and two geographical radiations in the invasive drosophilid Zaprionus indianus , 2008, Molecular ecology resources.

[22]  Gaurav Vaidya,et al.  DNA barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. , 2006, Systematic biology.

[23]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.

[24]  N. Mandrak,et al.  Identifying Canadian Freshwater Fishes through DNA Barcodes , 2008, PloS one.

[25]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[26]  P. Mellor,et al.  Rapid diagnostic PCR assays for members of the Culicoides obsoletus and Culicoides pulicaris species complexes, implicated vectors of bluetongue virus in Europe. , 2007, Veterinary microbiology.

[27]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[28]  R. I. Hill,et al.  Limited performance of DNA barcoding in a diverse community of tropical butterflies , 2007, Proceedings of the Royal Society B: Biological Sciences.

[29]  Tae-Kun Seo,et al.  Classification of Nucleotide Sequences Using Support Vector Machines , 2010, Journal of Molecular Evolution.

[30]  L. Koski,et al.  The Closest BLAST Hit Is Often Not the Nearest Neighbor , 2001, Journal of Molecular Evolution.

[31]  L. Kaila,et al.  DNA barcodes: Evaluating the potential of COI to diffentiate closely related species of Elachista (Lepidoptera: Gelechioidea: Elachistidae) from Australia , 2006 .

[32]  G. Bernardi,et al.  Genetic cryptic species as biological invaders: the case of a Lessepsian fish migrant, the hardyhead silverside Atherinomorus lacunosus , 2002 .

[33]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[34]  F. Bakker,et al.  A new hidden species of the Cymothoe caenis -complex (Lepidoptera: Nymphalidae) from western Africa , 2009 .

[35]  N. Baeshen,et al.  Biological Identifications Through DNA Barcodes , 2012 .

[36]  Ragupathy Subramanyam,et al.  Testing plant barcoding in a sister species complex of pantropical Acacia (Mimosoideae, Fabaceae) , 2009, Molecular ecology resources.

[37]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[38]  B. Schierwater,et al.  An integrative approach to species discovery in odonates: from character‐based DNA barcoding to ecology , 2010, Molecular ecology.

[39]  S. Graham,et al.  Are plant species inherently harder to discriminate than animal species using DNA barcoding markers? , 2009, Molecular ecology resources.

[40]  R. DeSalle,et al.  Comparing and combining distance‐based and character‐based approaches for barcoding turtles , 2011, Molecular ecology resources.

[41]  P. Hebert,et al.  Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[42]  D. J. Funk,et al.  Species-Level Paraphyly and Polyphyly: Frequency, Causes, and Consequences, with Insights from Animal Mitochondrial DNA , 2003 .

[43]  Giovanni Felici,et al.  Application of feature selection and classification to computational molecular biology , 2008 .

[44]  C. Cicero,et al.  Open access, freely available online Correspondence DNA Barcoding: Promise and Pitfalls , 2022 .

[45]  C. Cunningham,et al.  Using DNA to assess errors in tropical tree identifications: How often are ecologists wrong and when does it matter? , 2010 .

[46]  K. Armstrong,et al.  Fruit fly (Diptera: Tephritidae) species identification : a rapid molecular diagnostic technique for quarantine application , 1997 .

[47]  L. Excoffier,et al.  Gene flow and species delimitation. , 2009, Trends in ecology & evolution.

[48]  Jeremy R. deWaard,et al.  DNA barcodes for 1/1000 of the animal kingdom , 2009, Biology Letters.

[49]  Rob DeSalle,et al.  Integrating DNA barcode data and taxonomic practice: Determination, discovery, and description , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[50]  D. Steinke,et al.  DNA barcoding of shared fish species from the North Atlantic and Australasia: minimal divergence for most taxa, but Zeus faber and Lepidopus caudatus each probably constitute two species , 2008 .

[51]  J. Azpurua,et al.  Lutzomyia Sand Fly Diversity and Rates of Infection by Wolbachia and an Exotic Leishmania Species on Barro Colorado Island, Panama , 2010, PLoS neglected tropical diseases.

[52]  Mark Blaxter,et al.  Molecular barcodes for soil nematode identification , 2002, Molecular ecology.

[53]  Indra Neil Sarkar,et al.  Characteristic attributes in cancer microarrays , 2002, Journal of Biomedical Informatics.

[54]  A. Barbadilla,et al.  Selection efficiency and effective population size in Drosophila species , 2009, Journal of evolutionary biology.

[55]  C. McBride,et al.  Allopatric origin of cryptic butterfly species that were discovered feeding on distinct host plants in sympatry , 2009, Molecular ecology.

[56]  Giovanni Felici,et al.  Learning to classify species with barcodes , 2009, BMC Bioinformatics.

[57]  P. Hebert,et al.  The promise of DNA barcoding for taxonomy. , 2005, Systematic biology.

[58]  R. Ward,et al.  DNA barcoding Australia's fish species , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[59]  Rob DeSalle,et al.  An automated phylogenetic key for classifying homeoboxes , 2002 .

[60]  P. Crous,et al.  Development of taxon-specific sequence characterized amplified region (SCAR) markers based on actin sequences and DNA amplification fingerprinting (DAF): a case study in the Phoma exigua species complex. , 2009, Molecular plant pathology.

[61]  M. Dabert,et al.  The cereal rust mite Abacarus hystrix (Acari: Eriophyoidea) is a complex of species: evidence from mitochondrial and nuclear DNA sequences , 2009, Bulletin of Entomological Research.

[62]  W. John Kress,et al.  A DNA barcode for land plants , 2009, Proceedings of the National Academy of Sciences.

[63]  D. Stevenson,et al.  A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms , 2007, Cladistics : the international journal of the Willi Hennig Society.

[64]  J. Lamb,et al.  Genetic monitoring detects an overlooked cryptic species and reveals the diversity and distribution of three invasive Rattus congeners in south Africa , 2011, BMC Genetics.

[65]  R. Nielsen,et al.  Statistical approaches for DNA barcoding. , 2006, Systematic biology.

[66]  Giovanni Felici,et al.  Logic classification and feature selection for biomedical data , 2008, Comput. Math. Appl..

[67]  Howard A Ross,et al.  Testing the reliability of genetic methods of species identification via simulation. , 2008, Systematic biology.

[68]  Emily C. Moriarty,et al.  The importance of proper model assumption in bayesian phylogenetics. , 2004, Systematic biology.

[69]  M. Wiemers,et al.  Does the DNA barcoding gap exist? – a case study in blue butterflies (Lepidoptera: Lycaenidae) , 2007, Frontiers in Zoology.

[70]  J. Mell,et al.  Molecular evolution under increasing transposable element burden in Drosophila: A speed limit on the evolutionary arms race , 2011, BMC Evolutionary Biology.

[71]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[72]  C. Hsieh,et al.  Evidence from Molecular Markers and Population Genetic Analyses Suggests Recent Invasions of the Western North Pacific Region by Biotypes B and Q of Bemisia tabaci (Gennadius) , 2007, Environmental entomology.

[73]  R. Vilà,et al.  Complete DNA barcode reference library for a country's butterfly fauna reveals high performance for temperate Europe , 2011, Proceedings of the Royal Society B: Biological Sciences.

[74]  Klaus Truemper,et al.  A MINSAT Approach for Learning in Logic Domains , 2002, INFORMS J. Comput..

[75]  P. Hebert,et al.  DNA barcoding reveals overlooked marine fishes , 2009, Molecular ecology resources.

[76]  V. E. Panov,et al.  Molecular ecology of zebra mussel invasions , 2006, Molecular ecology.

[77]  A. Edwards,et al.  The reconstruction of evolution , 1963 .

[78]  R. Shatters,et al.  Analysis of host preference and geographical distribution of Anastrepha suspensa (Diptera: Tephritidae) using phylogenetic analyses of mitochondrial cytochrome oxidase I DNA sequence data , 2006, Bulletin of Entomological Research.

[79]  Rob DeSalle,et al.  The unholy trinity: taxonomy, species delimitation and DNA barcoding , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[80]  S. Donnellan,et al.  The utility of mitochondrial DNA sequences for the identification of forensically important blowflies (Diptera: Calliphoridae) in southeastern Australia. , 2001, Forensic science international.

[81]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[82]  M Steel,et al.  Properties of phylogenetic trees generated by Yule-type speciation models. , 2001, Mathematical biosciences.

[83]  Giovanni Felici,et al.  Species classification using DNA Barcode sequences: A comparative analysis , 2011 .

[84]  R. Harbach,et al.  A molecular phylogeny of mosquitoes in the Anopheles barbirostris Subgroup reveals cryptic species: implications for identification of disease vectors. , 2009, Molecular phylogenetics and evolution.

[85]  Zaid Abdo,et al.  A step toward barcoding life: a model-based, decision-theoretic method to assign genes to preexisting species groups. , 2007, Systematic biology.

[86]  Damon P. Little,et al.  DNA Barcode Sequence Identification Incorporating Taxonomic Hierarchy and within Taxon Variability , 2011, PloS one.

[87]  R. Wilkerson,et al.  Lineage divergence detected in the malaria vector Anopheles marajoara (Diptera: Culicidae) in Amazonian Brazil , 2010, Malaria Journal.

[88]  David J. Lohman,et al.  Cryptic species as a window on diversity and conservation. , 2007, Trends in ecology & evolution.

[89]  G. Brian Golding,et al.  Assigning sequences to species in the absence of large interspecific differences. , 2010, Molecular phylogenetics and evolution.

[90]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[91]  J. Neigel,et al.  DNA barcoding as a tool for coral reef conservation , 2007, Coral Reefs.

[92]  D. Sims,et al.  Molecular markers reveal spatially segregated cryptic species in a critically endangered fish, the common skate (Dipturus batis) , 2010, Proceedings of the Royal Society B: Biological Sciences.

[93]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[94]  Wouter Boomsma,et al.  Fast phylogenetic DNA barcoding , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[95]  R. Nichols,et al.  Gene trees and species trees are not the same. , 2001, Trends in ecology & evolution.

[96]  R. DeSalle,et al.  The genus Drosophila as a model for testing tree- and character-based methods of species identification using DNA barcoding. , 2010, Molecular phylogenetics and evolution.

[97]  R. Nielsen,et al.  A likelihood ratio test for species membership based on DNA sequence data , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[98]  Bruno Nevado,et al.  Comparative performances of DNA barcoding across insect orders , 2010, BMC Bioinformatics.

[99]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[100]  Olivier David,et al.  DNA barcode analysis: a comparison of phylogenetic and statistical classification methods , 2009, BMC Bioinformatics.