Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect barcodes from Colombia

Recent declines of insect populations at high rates have resulted in the need to develop a quick method to determine their diversity and to process massive data for the identification of species of highly diverse groups. A short sequence of DNA from COI is widely used for insect identification by comparing it against sequences of known species. Repositories of sequences are available online with tools that facilitate matching of the sequences of interest to a known individual. However, the performance of these tools can differ. Here we aim to assess the accuracy in identification of insect taxonomic categories from two repositories, BOLD Systems, and GenBank. This was done by comparing the sequence matches between taxonomist identification and the suggested identification from the platforms. We used 1160 sequences in eight orders of insects from Colombia. After the comparison, we reanalyzed the results from a representative subset of the data from the subfamily Scarabaeinae (Coleoptera). Overall, BOLD systems outperformed GenBank, and the performance of both engines differed by orders and other taxonomic categories (species, genus and family). Higher rates of accurate identification were obtained at family and genus level. The accuracy was higher in BOLD for the orders Coleoptera at family, for Coleoptera and Lepidoptera at genus, and species level. The other orders performed similarly in both repositories. Moreover, the Scarabaeinae subset showed that in this group species were correctly identified when BOLD match percentage was above 93.4% and a total of 85% of the samples were correctly assigned to a taxonomic category. These results accentuate the great potential of the identification engines to place insects accurately into their respective taxonomic categories based on DNA barcodes and highlight the reliable use of BOLD Systems for insect identification in the absence of a large reference database for a highly diverse country.

[1]  A. de Biase,et al.  Curation of a reference database of COI sequences for insect identification through DNA metabarcoding: COins , 2022, Database J. Biol. Databases Curation.

[2]  K. Katoh,et al.  ONTbarcoder and MinION barcodes aid biodiversity discovery and identification by everyone, for everyone , 2021, BMC biology.

[3]  R. Meier,et al.  A re‐analysis of the data in Sharkey et al.’s (2021) minimalist revision reveals that BINs do not deserve names, but BOLD Systems needs a stronger commitment to open science , 2021, Cladistics : the international journal of the Willi Hennig Society.

[4]  G. Halffter,et al.  Taxonomic revision of the Phanaeus endymion species group (Coleoptera: Scarabaeidae), with the descriptions of five new species , 2021 .

[5]  D. Janzen,et al.  Minimalist revision and description of 403 new species in 11 subfamilies of Costa Rican braconid parasitoid wasps, including host records for 219 species , 2021, ZooKeys.

[6]  F. Vaz-de-Mello,et al.  Towards a comprehensive taxonomic revision of the Neotropical dung beetle subgenus Deltochilum (Deltohyboma) Lane, 1946 (Coleoptera: Scarabaeidae: Scarabaeinae): Division into species-groups , 2021, PloS one.

[7]  Jeremy R. deWaard,et al.  Using DNA-barcoded Malaise trap samples to measure impact of a geothermal energy project on the biodiversity of a Costa Rican old-growth rain forest. , 2020, Genome.

[8]  Sujeevan Ratnasingham,et al.  BOLD and GenBank revisited – Do identification errors arise in the lab or in the sequence libraries? , 2020, PloS one.

[9]  F. Ronquist,et al.  The Swedish Malaise Trap Project: A 15 Year Retrospective on a Countrywide Insect Inventory , 2020, Biodiversity data journal.

[10]  Lu Sun,et al.  NCBI Taxonomy: a comprehensive update on curation, resources and tools , 2020, Database J. Biol. Databases Curation.

[11]  D. Janzen,et al.  A species-level taxonomic review and host associations of Glyptapanteles (Hymenoptera, Braconidae, Microgastrinae) with an emphasis on 136 new reared species from Costa Rica and Ecuador , 2019, ZooKeys.

[12]  Robert G. Young,et al.  Using DNA barcoding to improve invasive pest identification at U.S. ports-of-entry , 2019, PloS one.

[13]  Kelly A Meiklejohn,et al.  Assessment of BOLD and GenBank – Their accuracy and reliability for the identification of biological materials , 2019, PloS one.

[14]  D. Janzen,et al.  Perspective: Where might be many tropical insects? , 2019, Biological Conservation.

[15]  K. Wyckhuys,et al.  Worldwide decline of the entomofauna: A review of its drivers , 2019, Biological Conservation.

[16]  F. Vaz-de-Mello,et al.  Scarabaeinae dung beetles from Ecuador: a catalog, nomenclatural acts, and distribution records , 2019, ZooKeys.

[17]  F. Vaz-de-Mello,et al.  A monographic revision of the Neotropical dung beetle genus Sylvicanthon Halffter & Martínez, 1977 (Coleoptera: Scarabaeidae: Scarabaeinae: Deltochilini), including a reappraisal of the taxonomic history of ‘Canthon sensu lato’ , 2018, European Journal of Taxonomy.

[18]  Diego Esteban Martínez-Revelo,et al.  Escarabajos coprófagos de la cuenca alta y media del río Bita, Vichada, Colombia , 2018, Biota Colombiana.

[19]  R. Meier,et al.  Next-Generation identification tools for Nee Soon freshwater swamp forest, Singapore , 2018 .

[20]  D. Steinke,et al.  The School Malaise Trap Program: Coupling educational outreach with scientific discovery , 2017, PLoS biology.

[21]  A. Hausmann,et al.  Testing the Global Malaise Trap Program – How well does the current barcode reference library identify flying insects in Germany? , 2016, Biodiversity data journal.

[22]  Samuel Arvidsson,et al.  Species Identification in Malaise Trap Samples by DNA Barcoding Based on NGS Technologies and a Scoring Matrix , 2016, PloS one.

[23]  D. Janzen,et al.  DNA barcoding the Lepidoptera inventory of a large complex tropical conserved wildland, Area de Conservacion Guanacaste, northwestern Costa Rica. , 2016, Genome.

[24]  Arturo González-Alvarado,et al.  Escarabajos coprófagos (Coleoptera: Scarabaeidae: Scarabaeinae) de bosques secos colombianos en la Colección Entomológica del Instituto Alexander von Humboldt , 2015 .

[25]  M. Mutanen,et al.  Barcoding Beetles: A Regional Survey of 1872 Species Reveals High Identification Success and Unusually Deep Interspecific Divergences , 2014, PloS one.

[26]  E. Arbeláez‐Cortés Knowledge of Colombian biodiversity: published and indexed , 2013, Biodiversity and Conservation.

[27]  Sujeevan Ratnasingham,et al.  A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System , 2013, PloS one.

[28]  B. Kohlmann,et al.  Checklist and distribution atlas of the Scarabaeinae (Coleoptera: Scarabaeidae) of Costa Rica , 2012 .

[29]  J. Lobo,et al.  The Distribution of the Species of Eurysternus Dalman, 1824 (Coleoptera: Scarabaeidae) in America: Potential Distributions and the Locations of Areas to be Surveyed , 2012 .

[30]  Shane S. Sturrock,et al.  Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data , 2012, Bioinform..

[31]  C. Marshall,et al.  Has the Earth’s sixth mass extinction already arrived? , 2011, Nature.

[32]  J. Ragle,et al.  IUCN Red List of Threatened Species , 2010 .

[33]  D. Fabre,et al.  Global Bathymetry and Elevation Data at 30 Arc Seconds Resolution: SRTM30_PLUS , 2009 .

[34]  J. E. Rawlins,et al.  Integration of DNA barcoding into an ongoing inventory of complex tropical biodiversity , 2009, Molecular ecology resources.

[35]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[36]  Jeremy R. deWaard,et al.  An inexpensive, automation-friendly protocol for recovering high-quality DNA , 2006 .

[37]  D. Janzen,et al.  Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Jeremy R. deWaard,et al.  Biological identifications through DNA barcodes , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[39]  R. Vrijenhoek,et al.  DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. , 1994, Molecular marine biology and biotechnology.

[40]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[41]  Unai Pascual,et al.  Summary for policymakers of the global assessment report on biodiversity and ecosystem services of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services-ADVANCE UNEDITED VERSION , 2019 .

[42]  Maria Fernanda Gonzalez,et al.  El bosque seco tropical en Colombia , 2014 .

[43]  John-James Wilson DNA barcodes for insects. , 2012, Methods in molecular biology.

[44]  J. Lobo,et al.  DISTRIBUCIÓN CONOCIDA Y POTENCIAL DE LAS ESPECIES DEL GÉNERO EURYSTERNUS DALMAN, 1824 (COLEOPTERA: SCARABAEIDAE) DE COLOMBIA , 2010 .

[45]  O Hammer-Muntz,et al.  PAST: paleontological statistics software package for education and data analysis version 2.09 , 2001 .

[46]  F. Génier A REVISION OF THE NEOTROPICAL GENUS ONTHERUS ERICHSON (COLEOPTERA: SCARABAEIDAE, SCARABAEINAE) , 1996 .