Species Identification by Bayesian Fingerprinting: A Powerful Alternative to DNA Barcoding

A number of methods have been developed to use genetic sequence data to identify and delineate species. Some methods are based on heuristics, such as DNA barcoding which is based on a sequence-distance threshold, while others use Bayesian model comparison under the multispecies coalescent model. Here we use mathematical analysis and computer simulation to demonstrate large differences in statistical performance of species identification between DNA barcoding and Bayesian inference under the multispecies coalescent model as implemented in the BPP program. We show that a fixed genetic-distance threshold as used in DNA barcoding is problematic for delimiting species, even if the threshold is “optimized”, because different species have different population sizes and different divergence times, and therefore display different amounts of intra-species versus inter-species variation. In contrast, BPP can reliably delimit species in such situations with only one locus and rarely supports a wrong assignment with high posterior probability. While under-sampling or rare specimens may pose problems for heuristic methods, BPP can delimit species with high power when multi-locus data are used, even if the species is represented by a single specimen. Finally we demonstrate that BPP may be powerful for delimiting cryptic species using specimens that are misidentified as a single species in the barcoding library.

[1]  Tianqi Zhu,et al.  Evaluation of a bayesian coalescent method of species delimitation. , 2011, Systematic biology.

[2]  P. Hebert,et al.  Identification of Birds through DNA Barcodes , 2004, PLoS biology.

[3]  N. Baeshen,et al.  Biological Identifications Through DNA Barcodes , 2012 .

[4]  B. Rannala,et al.  Efficient Bayesian Species Tree Inference under the Multispecies Coalescent , 2015, Systematic biology.

[5]  C. Meyer,et al.  DNA Barcoding: Error Rates Based on Comprehensive Sampling , 2005, PLoS biology.

[6]  B. Rannala,et al.  Bayesian species delimitation using multilocus sequence data , 2010, Proceedings of the National Academy of Sciences.

[7]  A. Lambert,et al.  ABGD, Automatic Barcode Gap Discovery for primary species delimitation , 2012, Molecular ecology.

[8]  T. Barraclough,et al.  Delimiting Species Using Single-Locus Data and the Generalized Mixed Yule Coalescent Approach: A Revised Method and Evaluation on Simulated Data Sets , 2013, Systematic biology.

[9]  S. Boyer,et al.  Spider: An R package for the analysis of species identity and evolution, with particular reference to DNA barcoding , 2012, Molecular ecology resources.

[10]  Ziheng Yang The BPP program for species tree estimation and species delimitation , 2015 .

[11]  Michael Balke,et al.  Determining species boundaries in a world full of rarity: singletons, species delimitation methods. , 2012, Systematic biology.

[12]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[13]  Jonas Eberle,et al.  Rarity and Incomplete Sampling in DNA-Based Species Delimitation. , 2016, Systematic biology.

[14]  M. Dowton,et al.  A preliminary framework for DNA barcoding, incorporating the multispecies coalescent. , 2014, Systematic biology.

[15]  Sujeevan Ratnasingham,et al.  A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System , 2013, PloS one.

[16]  J. Robertson,et al.  DNA detective: a review of molecular approaches to wildlife forensics , 2010, Forensic science, medicine, and pathology.

[17]  R. Cruickshank,et al.  Known knowns, known unknowns, unknown unknowns and unknown knowns in DNA barcoding: a comment on Dowton et al. , 2014, Systematic biology.