Coalescent-Based DNA Barcoding: Multilocus Analysis and Robustness

DNA barcoding is the assignment of individuals to species using standardized mitochondrial sequences. Nuclear data are sometimes added to the mitochondrial data to increase power. A barcoding method for analysing mitochondrial and nuclear data is developed. It is a Bayesian method based on the coalescent model. Then this method is assessed using simulated and real data. It is found that adding nuclear data can reduce the number of ambiguous assignments. Finally, the robustness of coalescent-based barcoding to departures from model assumptions is studied using simulations. This method is found to be robust to past population size variations, to within-species population structures, and to designs that poorly sample populations within species. Supplementary Material is available online at www.liebertonline.com/cmb.

[1]  Jon A Yamato,et al.  Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. , 1995, Genetics.

[2]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[3]  Wouter Boomsma,et al.  Statistical assignment of DNA sequences using Bayesian phylogenetics. , 2008, Systematic biology.

[4]  J. Picard,et al.  Lectures on probability theory and statistics , 2004 .

[5]  Olivier Gascuel,et al.  Inferring ancestral sequences in taxon-rich phylogenies. , 2010, Mathematical biosciences.

[6]  D. Balding,et al.  Handbook of statistical genetics , 2004 .

[7]  C. Anderson‐Cook Statistical Tools for Nonlinear Regression: a Practical Guide With S-PLUS and R Examples , 2004 .

[8]  M. De Iorio,et al.  Importance sampling on coalescent histories. I , 2004, Advances in Applied Probability.

[9]  M. Nordborg,et al.  Coalescent Theory , 2019, Handbook of Statistical Genomics.

[10]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[11]  Pierre Bernard,et al.  Lectures on Probability Theory and Statistics: Ecole d'Ete de Probabilites de Saint-Flour XXVI - 1996 , 1997 .

[12]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[13]  G. A. Watterson On the number of segregating sites in genetical models without recombination. , 1975, Theoretical population biology.

[14]  Wouter Boomsma,et al.  Fast phylogenetic DNA barcoding , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[15]  Olivier David,et al.  DNA barcode analysis: a comparison of phylogenetic and statistical classification methods , 2009, BMC Bioinformatics.

[16]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[17]  P. Donnelly,et al.  Inference in molecular population genetics , 2000 .

[18]  R. Nielsen,et al.  A likelihood ratio test for species membership based on DNA sequence data , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[19]  D. Janzen,et al.  Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Ofer Zeitouni,et al.  Lectures on probability theory and statistics , 2004 .

[21]  P. Hebert,et al.  Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[22]  W. Ewens Mathematical Population Genetics , 1980 .

[23]  L. Frézal,et al.  Four years of DNA barcoding: current advances and prospects. , 2008, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[24]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[25]  R. I. Hill,et al.  Limited performance of DNA barcoding in a diverse community of tropical butterflies , 2007, Proceedings of the Royal Society B: Biological Sciences.

[26]  R. Griffiths,et al.  Inference from gene trees in a subdivided population. , 2000, Theoretical population biology.

[27]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[28]  P. Taberlet,et al.  DNA barcoding for ecologists. , 2009, Trends in ecology & evolution.

[29]  Zaid Abdo,et al.  A step toward barcoding life: a model-based, decision-theoretic method to assign genes to preexisting species groups. , 2007, Systematic biology.

[30]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[31]  S. Miller DNA barcoding and the renaissance of taxonomy , 2007, Proceedings of the National Academy of Sciences.

[32]  Alfried P Vogler,et al.  Sequence-based species delimitation for the DNA taxonomy of undescribed insects. , 2006, Systematic biology.

[33]  C. J-F,et al.  THE COALESCENT , 1980 .