Bioinformatic challenges for DNA metabarcoding of plants and animals

Almost all empirical studies in ecology have to identify the species involved in the ecological process under examination. DNA metabarcoding, which couples the principles of DNA barcoding with next generation sequencing technology, provides an opportunity to easily produce large amounts of data on biodiversity. Microbiologists have long used metabarcoding approaches, but use of this technique in the assessment of biodiversity in plant and animal communities is under‐explored. Despite its relationship with DNA barcoding, several unique features of DNA metabarcoding justify the development of specific data analysis methodologies. In this review, we describe the bioinformatics tools available for DNA metabarcoding of plants and animals, and we revisit others developed for DNA barcoding or microbial metabarcoding. We also discuss the principles and associated tools for evaluating and comparing DNA barcodes in the context of DNA metabarcoding, for designing new custom‐made barcodes adapted to specific ecological question, for dealing with PCR and sequencing errors, and for inferring taxonomical data from sequences.

[1]  A. Leaché,et al.  Bayesian species delimitation in West African forest geckos (Hemidactylus fasciatus) , 2010, Proceedings of the Royal Society B: Biological Sciences.

[2]  S Rozen,et al.  Primer3 on the WWW for general users and for biologist programmers. , 2000, Methods in molecular biology.

[3]  W. Ian Lipkin,et al.  Greene SCPrimer: a rapid comprehensive tool for designing degenerate primers from multiple sequence alignments , 2006, Nucleic acids research.

[4]  R. Knight,et al.  Rapid denoising of pyrosequencing amplicon data: exploiting the rank-abundance distribution , 2010, Nature Methods.

[5]  L. Sirovich,et al.  A Scalable Method for Analysis and Display of DNA Sequences , 2009, PloS one.

[6]  Anisah W. Ghoorah,et al.  jMOTU and Taxonerator: Turning DNA Barcode Sequences into Annotated Operational Taxonomic Units , 2011, PloS one.

[7]  Eric Coissac,et al.  OligoTag: a program for designing sets of tags for next-generation sequencing of multiplexed samples. , 2012, Methods in molecular biology.

[8]  M. Sogin,et al.  Water mass‐specificity of bacterial communities in the North Atlantic revealed by massively parallel sequencing , 2011, Molecular ecology.

[9]  Christian Hennig,et al.  Species delimitation using dominant and codominant multilocus markers. , 2010, Systematic biology.

[10]  P. Taberlet,et al.  Towards next‐generation biodiversity assessment using DNA metabarcoding , 2012, Molecular ecology.

[11]  T. Burgess,et al.  Use of the Genealogical Sorting Index (GSI) to delineate species boundaries in the Neofusicoccum parvum-Neofusicoccum ribis species complex. , 2011, Molecular phylogenetics and evolution.

[12]  M S Waterman,et al.  A new computational method for detection of chimeric 16S rRNA artifacts generated by PCR amplification from mixed bacterial populations , 1997, Applied and environmental microbiology.

[13]  Ka Hou Chu,et al.  Rapid DNA barcoding analysis of large datasets using the composition vector method , 2009, BMC Bioinformatics.

[14]  Thierry Vermat,et al.  Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding , 2006, Nucleic acids research.

[15]  Alfried P Vogler,et al.  Sequence-based species delimitation for the DNA taxonomy of undescribed insects. , 2006, Systematic biology.

[16]  Rob Knight,et al.  Microbial Biogeography of Public Restroom Surfaces , 2011, PloS one.

[17]  Michael Balke,et al.  Accelerated species inventory on Madagascar using coalescent-based models of species delineation. , 2009, Systematic biology.

[18]  A. J. Jones,et al.  At Least 1 in 20 16S rRNA Sequence Records Currently Held in Public Repositories Is Estimated To Contain Substantial Anomalies , 2005, Applied and Environmental Microbiology.

[19]  T. Hazen,et al.  Hydrocarbon-Degrading Bacteria and the Bacterial Community Response in Gulf of Mexico Beach Sands Impacted by the Deepwater Horizon Oil Spill , 2011, Applied and Environmental Microbiology.

[20]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[21]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[22]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[23]  K. Ross,et al.  Species delimitation: a case study in a problematic ant taxon. , 2010, Systematic biology.

[24]  C. Roberts Advocating against advocacy in fisheries management: Fisheries Ecology and Management by Carl J. Walters and Steven J.D. Martell. Princeton University Press, 2004. US$99.50/US$45.00 hbk/pbk (448 pages) ISBN 0 691 11544 3 , 2004 .

[25]  S. Ho,et al.  Potential efficacy of mitochondrial genes for animal DNA barcoding: a case study using eutherian mammals , 2011, BMC Genomics.

[26]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[27]  Olivier David,et al.  DNA barcode analysis: a comparison of phylogenetic and statistical classification methods , 2009, BMC Bioinformatics.

[28]  Bryan C. Carstens,et al.  Delimiting species without monophyletic gene trees. , 2007, Systematic biology.

[29]  Gaurav Vaidya,et al.  DNA barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. , 2006, Systematic biology.

[30]  L. Keller,et al.  Back to the future: museum specimens in population genetics. , 2007, Trends in ecology & evolution.

[31]  P. Taberlet,et al.  Tracking earthworm communities from soil DNA , 2012, Molecular ecology.

[32]  G. Brian Golding,et al.  Assigning sequences to species in the absence of large interspecific differences. , 2010, Molecular phylogenetics and evolution.

[33]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[34]  Denis Krompass,et al.  Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood , 2011, Systematic biology.

[35]  F. Bushman,et al.  DNA bar coding and pyrosequencing to identify rare HIV drug resistance mutations , 2007, Nucleic acids research.

[36]  M. Ronaghi,et al.  A pyrosequencing-tailored nucleotide barcode design unveils opportunities for large-scale sample multiplexing , 2007, Nucleic acids research.

[37]  P. Taberlet,et al.  Using next‐generation sequencing for molecular reconstruction of past Arctic vegetation and climate , 2010, Molecular ecology resources.

[38]  Ion I. Măndoiu,et al.  PrimerHunter: a primer design tool for PCR-based virus subtype identification , 2009, Nucleic acids research.

[39]  Michael P. Anderson,et al.  SEQUENTIAL BAYESIAN CLASSIFICATION: DNA BARCODES , 2009 .

[40]  E. Virginia Armbrust,et al.  pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree , 2010, BMC Bioinformatics.

[41]  François Pompanon,et al.  An In silico approach for the evaluation of DNA barcodes , 2010, BMC Genomics.

[42]  Elizabeth A. Kellogg,et al.  Primaclade - a flexible tool to find conserved PCR primers across multiple species , 2005, Bioinform..

[43]  K. Tamura,et al.  PCR Error and Molecular Population Genetics , 1999, Biochemical Genetics.

[44]  C. Pedrós-Alió,et al.  Spatial patterns of bacterial richness and evenness in the NW Mediterranean Sea explored by pyrosequencing of the 16S rRNA , 2010 .

[45]  Vladimir Pavlovic,et al.  Efficient alignment-free DNA barcode analytics , 2009, BMC Bioinformatics.

[46]  Rob DeSalle,et al.  Integrating DNA barcode data and taxonomic practice: Determination, discovery, and description , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[47]  T. Stoeck,et al.  Multiple marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water , 2010, Molecular ecology.

[48]  Jonathan P. Bollback,et al.  The Use of Coded PCR Primers Enables High-Throughput Sequencing of Multiple Homolog Amplification Products by 454 Parallel Sequencing , 2007, PloS one.

[49]  R. Knight,et al.  Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex , 2008, Nature Methods.

[50]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[51]  V. Nijman,et al.  Performance of distance-based DNA barcoding in the molecular identification of primates. , 2008, Comptes rendus biologies.

[52]  Simon N. Jarman,et al.  Amplicon: software for designing PCR primers on aligned DNA sequences , 2004, Bioinform..

[53]  Jeremy R. deWaard,et al.  Biological identifications through DNA barcodes , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[54]  D. Stevenson,et al.  A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms , 2007, Cladistics : the international journal of the Willi Hennig Society.

[55]  L Lacey Knowles,et al.  Estimating species trees: methods of phylogenetic analysis when there is incongruence across genes. , 2009, Systematic biology.

[56]  Sylvie Duthoit,et al.  DNA barcoding the floras of biodiversity hotspots , 2008, Proceedings of the National Academy of Sciences.

[57]  R DeSalle,et al.  Character-based DNA barcoding allows discrimination of genera, species and populations in Odonata , 2007, Proceedings of the Royal Society B: Biological Sciences.

[58]  Arie van der Meijden,et al.  Comparative performance of the 16S rRNA gene in DNA barcoding of amphibians , 2005, Frontiers in Zoology.

[59]  Susan M. Huse,et al.  Ironing out the wrinkles in the rare biosphere through improved OTU clustering , 2010, Environmental microbiology.

[60]  E. Teeling,et al.  UniPrime: a workflow-based platform for improved universal primer design , 2008, Nucleic acids research.

[61]  Mehrdad Hajibabaei,et al.  A minimalist barcode can identify a specimen whose DNA is degraded , 2006 .

[62]  Daniel N. Frank,et al.  BARCRAWL and BARTAB: software tools for the design and implementation of barcoded primers for highly multiplexed DNA sequencing , 2009, BMC Bioinformatics.

[63]  P. Choler,et al.  Assessment of Microbial Communities by Graph Partitioning in a Study of Soil Fungi in Two Alpine Meadows , 2009, Applied and Environmental Microbiology.

[64]  Mehrdad Hajibabaei,et al.  Googling DNA sequences on the World Wide Web , 2009, BMC Bioinformatics.

[65]  Andy F. S. Taylor,et al.  The UNITE database for molecular identification of fungi--recent updates and future perspectives. , 2010, The New phytologist.

[66]  L. Orlando,et al.  Meta‐barcoding of ‘dirt’ DNA from soil reflects vertebrate biodiversity , 2012, Molecular ecology.

[67]  Zaid Abdo,et al.  A step toward barcoding life: a model-based, decision-theoretic method to assign genes to preexisting species groups. , 2007, Systematic biology.

[68]  P. Taberlet,et al.  Universal DNA-based methods for assessing the diet of grazing livestock and wildlife from feces. , 2009, Journal of agricultural and food chemistry.

[69]  A. Meyer,et al.  TaxI: a software tool for DNA barcoding using distance methods , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[70]  F. Hildebrand,et al.  Caspase deficiency alters the murine gut microbiome , 2011, Cell Death and Disease.

[71]  P. Hebert,et al.  Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[72]  Damon P. Little,et al.  DNA Barcode Sequence Identification Incorporating Taxonomic Hierarchy and within Taxon Variability , 2011, PloS one.

[73]  Mark Blaxter,et al.  Defining operational taxonomic units using DNA barcode data , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[74]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[75]  R. Nielsen,et al.  Statistical approaches for DNA barcoding. , 2006, Systematic biology.

[76]  R. DeSalle,et al.  Calibrating phylogenetic species formation in a threatened insect using DNA from historical specimens , 2003, Molecular ecology.

[77]  D. Baird,et al.  Environmental Barcoding: A Next-Generation Sequencing Approach for Biomonitoring Applications Using River Benthos , 2011, PloS one.

[78]  Brian C. O'Meara,et al.  New Heuristic Methods for Joint Species Delimitation and Species Tree Inference , 2009, Systematic biology.

[79]  Charles Bouveyron,et al.  Supervised classification of categorical data with uncertain labels for DNA barcoding , 2009, ESANN.

[80]  V. I. Gusarov,et al.  New environmental metabarcodes for analysing soil DNA: potential for studying past and present ecosystems , 2012, Molecular ecology.

[81]  Kishori M. Konwar,et al.  DNA-BAR: distinguisher selection for DNA barcoding , 2005, Bioinform..

[82]  B. Haas,et al.  Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. , 2011, Genome research.

[83]  R. B. Payne,et al.  Single base errors in PCR products from avian museum specimens and their effect on estimates of historical genetic diversity , 2007, Conservation Genetics.

[84]  Namshin Kim,et al.  QPRIMER: a quick web-based application for designing conserved PCR primers from multigenome alignments , 2007, Bioinform..

[85]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[86]  P. Taberlet,et al.  Who is eating what: diet assessment using next generation sequencing , 2012, Molecular ecology.

[87]  Michael P. Anderson Bayesian classification of DNA barcodes , 2009 .

[88]  R. DeSalle Species Discovery versus Species Identification in DNA Barcoding Efforts: Response to Rubinoff , 2006, Conservation biology : the journal of the Society for Conservation Biology.

[89]  Bryan C Carstens,et al.  Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. , 2007, Systematic biology.

[90]  L. Zinger,et al.  Two decades of describing the unseen majority of aquatic microbial diversity , 2012, Molecular ecology.

[91]  Bryan C. Carstens,et al.  SpedeSTEM: a rapid and accurate method for species delimitation , 2011, Molecular ecology resources.

[92]  Bernd Schierwater,et al.  Character-based DNA barcoding: a superior tool for species classification. , 2009, Berliner und Munchener tierarztliche Wochenschrift.

[93]  N. Baeshen,et al.  Biological Identifications Through DNA Barcodes , 2012 .

[94]  C. Bonferroni Il calcolo delle assicurazioni su gruppi di teste , 1935 .

[95]  James Haile,et al.  Ancient Biomolecules from Deep Ice Cores Reveal a Forested Southern Greenland , 2007, Science.

[96]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[97]  Joshua Adams,et al.  Using Population Genetic Theory and DNA Sequences for Species Detection and Identification in Asexual Organisms , 2010, PloS one.

[98]  P. Taberlet,et al.  New perspectives in diet analysis based on DNA barcoding and parallel pyrosequencing: the trnL approach , 2009, Molecular ecology resources.

[99]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[100]  C. Quince,et al.  Accurate determination of microbial diversity from 454 pyrosequencing data , 2009, Nature Methods.

[101]  A. Lambert,et al.  ABGD, Automatic Barcode Gap Discovery for primary species delimitation , 2012, Molecular ecology.

[102]  A. Zhang,et al.  Inferring species membership using DNA sequences with back-propagation neural networks. , 2008, Systematic biology.

[103]  Wouter Boomsma,et al.  Statistical assignment of DNA sequences using Bayesian phylogenetics. , 2008, Systematic biology.

[104]  Dawn Field,et al.  The seasonal structure of microbial communities in the Western English Channel. , 2009, Environmental microbiology.

[105]  Vanja Klepac-Ceraj,et al.  PCR-Induced Sequence Artifacts and Bias: Insights from Comparison of Two 16S rRNA Clone Libraries Constructed from the Same Sample , 2005, Applied and Environmental Microbiology.

[106]  P. Taberlet,et al.  DNA barcoding for ecologists. , 2009, Trends in ecology & evolution.

[107]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[108]  R. Crozier,et al.  A fuzzy‐set‐theory‐based approach to analyse species membership in DNA barcoding , 2012, Molecular ecology.

[109]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[110]  Michael J. Brusco,et al.  Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques , 2007, J. Classif..

[111]  Pierre Taberlet,et al.  ITS as an environmental DNA barcode for fungi: an in silico approach reveals potential PCR biases , 2010, BMC Microbiology.

[112]  John W.H. Trueman,et al.  Integrative taxonomy, or iterative taxonomy? , 2011 .

[113]  Aurélien Miralles,et al.  The integrative future of taxonomy , 2010, Frontiers in Zoology.

[114]  Pierre Taberlet,et al.  Analysing diet of small herbivores: the efficiency of DNA barcoding coupled with high-throughput pyrosequencing for deciphering the composition of complex plant mixtures , 2009, Frontiers in Zoology.

[115]  L. Frézal,et al.  Four years of DNA barcoding: current advances and prospects. , 2008, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[116]  Alain Viari,et al.  ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis , 2011, Nucleic acids research.

[117]  P. Hebert,et al.  Identification of Birds through DNA Barcodes , 2004, PLoS biology.

[118]  Giovanni Felici,et al.  Learning to classify species with barcodes , 2009, BMC Bioinformatics.

[119]  J. Landry,et al.  A universal DNA mini-barcode for biodiversity analysis , 2008, BMC Genomics.

[120]  James Haile,et al.  DNA-Based Faecal Dietary Analysis: A Comparison of qPCR and High Throughput Sequencing Approaches , 2011, PloS one.

[121]  W. John Kress,et al.  A DNA barcode for land plants , 2009, Proceedings of the National Academy of Sciences.

[122]  Rob DeSalle,et al.  The unholy trinity: taxonomy, species delimitation and DNA barcoding , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[123]  J. Sites,et al.  Delimiting species: a Renaissance issue in systematic biology , 2003 .

[124]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[125]  Thomas Huber,et al.  Bellerophon: a program to detect chimeric sequences in multiple sequence alignments , 2004, Bioinform..

[126]  J. Pettengill,et al.  An evaluation of candidate plant DNA barcodes and assignment methods in diagnosing 29 species in the genus Agalinis (Orobanchaceae). , 2010, American journal of botany.

[127]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[128]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[129]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[130]  Maido Remm,et al.  Enhancements and modifications of primer design program Primer3 , 2007, Bioinform..

[131]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[132]  P. Taberlet,et al.  Prey Preference of Snow Leopard (Panthera uncia) in South Gobi, Mongolia , 2012, PloS one.

[133]  Bryan C. Carstens,et al.  Species Delimitation Using a Combined Coalescent and Information-Theoretic Approach: An Example from North American Myotis Bats , 2010, Systematic biology.

[134]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[135]  V. Kunin,et al.  Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. , 2009, Environmental microbiology.

[136]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[137]  Howard A Ross,et al.  Testing the reliability of genetic methods of species identification via simulation. , 2008, Systematic biology.

[138]  Lawrence Sirovich,et al.  Structural Analysis of Biodiversity , 2010, PloS one.

[139]  Carsten Wiuf,et al.  Diverse Plant and Animal Genetic Records from Holocene and Pleistocene Sediments , 2003, Science.

[140]  P. Taberlet,et al.  Soil sampling and isolation of extracellular DNA from large amount of starting material suitable for metabarcoding studies , 2012, Molecular ecology.