Testing the reliability of genetic methods of species identification via simulation.

Although genetic methods of species identification, especially DNA barcoding, are strongly debated, tests of these methods have been restricted to a few empirical cases for pragmatic reasons. Here we use simulation to test the performance of methods based on sequence comparison (BLAST and genetic distance) and tree topology over a wide range of evolutionary scenarios. Sequences were simulated on a range of gene trees spanning almost three orders of magnitude in tree depth and in coalescent depth; that is, deep or shallow trees with deep or shallow coalescences. When the query's conspecific sequences were included in the reference alignment, the rate of positive identification was related to the degree to which different species were genetically differentiated. The BLAST, distance, and liberal tree-based methods returned higher rates of correct identification than did the strict tree-based requirement that the query was within, but not sister to, a single-species clade. Under this more conservative approach, ambiguous outcomes occurred in inverse proportion to the number of reference sequences per species. When the query's conspecific sequences were not in the reference alignment, only the strict tree-based approach was relatively immune to making false-positive identifications. Thresholds affected the rates at which false-positive identifications were made when the query's species was unrepresented in the reference alignment but did not otherwise influence outcomes. A conservative approach using the strict tree-based method should be used initially in large-scale identification systems, with effort made to maximize sequence sampling within species. Once the genetic variation within a taxonomic group is well characterized and the taxonomy resolved, then the choice of method used should be dictated by considerations of computational efficiency. The requirement for extensive genetic sampling may render these techniques inappropriate in some circumstances.

[1]  H. Magalon,et al.  DNA barcoding cannot reliably identify species of the blowfly genus Protocalliphora (Diptera: Calliphoridae) , 2007, Proceedings of the Royal Society B: Biological Sciences.

[2]  Mehrdad Hajibabaei,et al.  A minimalist barcode can identify a specimen whose DNA is degraded , 2006 .

[3]  D. Tautz,et al.  A plea for DNA taxonomy , 2003 .

[4]  S. Oliver,et al.  Sequence Search Algorithms for Single Pass Sequence Identification: Does One Size Fit All? , 2001, Comparative and functional genomics.

[5]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[6]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[7]  C. Moritz,et al.  DNA barcoding will often fail to discover new animal species over broad parameter space. , 2006, Systematic biology.

[8]  Gaurav Vaidya,et al.  DNA barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. , 2006, Systematic biology.

[9]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[10]  M. Sanderson HOW MANY TAXA MUST BE SAMPLED TO IDENTIFY THE ROOT NODE OF A LARGE CLADE , 1996 .

[11]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[12]  Zaid Abdo,et al.  A step toward barcoding life: a model-based, decision-theoretic method to assign genes to preexisting species groups. , 2007, Systematic biology.

[13]  H A Ross,et al.  DNA surveillance: web-based molecular identification of whales, dolphins, and porpoises. , 2003, The Journal of heredity.

[14]  Nicolas Salamin,et al.  Land plants and DNA barcodes: short-term and long-term goals , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[15]  Jeremy R. deWaard,et al.  Biological identifications through DNA barcodes , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[16]  H. Ross,et al.  Using phylogenetic analyses and reference datasets to validate the species identities of cetacean sequences in GenBank. , 2006, Molecular phylogenetics and evolution.

[17]  P. Hebert,et al.  Identification of Birds through DNA Barcodes , 2004, PLoS biology.

[18]  L. Koski,et al.  The Closest BLAST Hit Is Often Not the Nearest Neighbor , 2001, Journal of Molecular Evolution.

[19]  M. Blaxter The promise of a DNA taxonomy. , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[20]  D. Janzen,et al.  Wedding biodiversity inventory of a large and complex Lepidoptera fauna with DNA barcoding , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[21]  C. Meyer,et al.  The Controversy , 2022 .

[22]  M. Nei,et al.  The neighbor-joining method , 1987 .

[23]  Paul D. N. Hebert,et al.  Identifying spiders through DNA barcodes , 2005 .

[24]  K. Will,et al.  Myth of the molecule: DNA barcodes for species cannot replace morphology for identification and classification , 2004, Cladistics : the international journal of the Willi Hennig Society.

[25]  E. Willassen,et al.  A comprehensive DNA sequence library is essential for identification with DNA barcodes. , 2007, Molecular phylogenetics and evolution.

[26]  Pankaj Agarwal,et al.  Comparative accuracy of methods for protein sequence similarity search , 1998, Bioinform..

[27]  C. S. Baker,et al.  Molecular genetic identification of southern hemisphere beaked whales (Cetacea: Ziphiidae) , 1998, Molecular ecology.

[28]  M. Dowton,et al.  Using COI barcodes to identify forensically and medically important blowflies , 2007, Medical and veterinary entomology.

[29]  D. Stevenson,et al.  A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms , 2007, Cladistics : the international journal of the Willi Hennig Society.

[30]  Vladimir Makarenkov,et al.  From a Phylogenetic Tree to a Reticulated Network , 2004, J. Comput. Biol..

[31]  Stephen Cameron,et al.  A genomic perspective on the shortcomings of mitochondrial DNA for "barcoding" identification. , 2006, The Journal of heredity.

[32]  S. Cameron,et al.  Are plant DNA barcodes a search for the Holy Grail? , 2006, Trends in ecology & evolution.

[33]  M. Holder,et al.  Phylogeny estimation: traditional and Bayesian approaches , 2003, Nature Reviews Genetics.

[34]  R. Nielsen,et al.  Statistical approaches for DNA barcoding. , 2006, Systematic biology.

[35]  O. Gascuel,et al.  Efficient biased estimation of evolutionary distances when substitution rates vary across sites. , 2002, Molecular biology and evolution.

[36]  A. Vogler,et al.  Beyond barcodes: complex DNA taxonomy of a South Pacific Island radiation , 2006, Proceedings of the Royal Society B: Biological Sciences.

[37]  S. Palumbi,et al.  Which whales are hunted? A molecular genetic approach to monitoring whaling. , 1994, Science.

[38]  R. Nielsen,et al.  A likelihood ratio test for species membership based on DNA sequence data , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[39]  S. Cameron,et al.  Who will actually use DNA barcoding and what will it cost? , 2006, Systematic biology.

[40]  Andy Brass,et al.  Searching DNA databases for similarities to DNA sequences: when is a match significant? , 1998, Bioinform..

[41]  F. Cipriano,et al.  Molecular genetic identification of whale and dolphin products from commercial markets in Korea and Japan , 1996 .

[42]  D. Janzen,et al.  Use of DNA barcodes to identify flowering plants. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[43]  D. J. Funk,et al.  Species-Level Paraphyly and Polyphyly: Frequency, Causes, and Consequences, with Insights from Animal Mitochondrial DNA , 2003 .

[44]  C. Cicero,et al.  Open access, freely available online Correspondence DNA Barcoding: Promise and Pitfalls , 2022 .

[45]  Korbinian Strimmer,et al.  PAL: an object-oriented programming library for molecular evolution and phylogenetics , 2001, Bioinform..