From Benchtop to Desktop: Important Considerations when Designing Amplicon Sequencing Workflows

Amplicon sequencing has been the method of choice in many high-throughput DNA sequencing (HTS) applications. To date there has been a heavy focus on the means by which to analyse the burgeoning amount of data afforded by HTS. In contrast, there has been a distinct lack of attention paid to considerations surrounding the importance of sample preparation and the fidelity of library generation. No amount of high-end bioinformatics can compensate for poorly prepared samples and it is therefore imperative that careful attention is given to sample preparation and library generation within workflows, especially those involving multiple PCR steps. This paper redresses this imbalance by focusing on aspects pertaining to the benchtop within typical amplicon workflows: sample screening, the target region, and library generation. Empirical data is provided to illustrate the scope of the problem. Lastly, the impact of various data analysis parameters is also investigated in the context of how the data was initially generated. It is hoped this paper may serve to highlight the importance of pre-analysis workflows in achieving meaningful, future-proof data that can be analysed appropriately. As amplicon sequencing gains traction in a variety of diagnostic applications from forensics to environmental DNA (eDNA) it is paramount workflows and analytics are both fit for purpose.

[1]  P. Taberlet,et al.  Species detection using environmental DNA from water samples , 2008, Biology Letters.

[2]  S. Denman,et al.  Strategy for Modular Tagged High-Throughput Amplicon Sequencing , 2011, Applied and Environmental Microbiology.

[3]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[4]  R. Knight,et al.  Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data , 2009, The ISME Journal.

[5]  François Pompanon,et al.  An In silico approach for the evaluation of DNA barcodes , 2010, BMC Genomics.

[6]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[7]  P.A.C.R. Costa,et al.  A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data , 2010, BMC Genomics.

[8]  Iraad F Bronner,et al.  Improved Protocols for Illumina Sequencing. , 2013, Current protocols in human genetics.

[9]  Björn Rotter,et al.  DNA fingerprinting in botany: past, present, future , 2014, Investigative Genetics.

[10]  S. Pääbo,et al.  DNA damage promotes jumping between templates during enzymatic amplification. , 1990, The Journal of biological chemistry.

[11]  Mehrdad Hajibabaei,et al.  Next‐generation sequencing technologies for environmental DNA research , 2012, Molecular ecology.

[12]  Michael Bunce,et al.  Metagenomic analyses of bacteria on human hairs: a qualitative assessment for applications in forensic science , 2014, Investigative Genetics.

[13]  A. Chiaradia,et al.  Pyrosequencing faecal DNA to determine diet of little penguins: is what goes in what comes out? , 2010, Conservation Genetics.

[14]  Paul Turner,et al.  Reagent contamination can critically impact sequence-based microbiome analyses , 2014, bioRxiv.

[15]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[16]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[17]  Pavel Skums,et al.  Next-generation sequencing reveals large connected networks of intra-host HCV variants , 2014, BMC Genomics.

[18]  William A. Walters,et al.  Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample , 2010, Proceedings of the National Academy of Sciences.

[19]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[20]  M. Hindell,et al.  Studying Seabird Diet through Genetic Analysis of Faeces: A Case Study on Macaroni Penguins (Eudyptes chrysolophus) , 2007, PloS one.

[21]  Thomas LaFramboise,et al.  Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing , 2006, Nature Medicine.

[22]  Rob Knight,et al.  Metagenomics reveals sediment microbial community response to Deepwater Horizon oil spill , 2014, The ISME Journal.

[23]  Dáithí C. Murray,et al.  Scrapheap Challenge: A novel bulk-bone metabarcoding method to investigate ancient DNA in faunal assemblages , 2013, Scientific Reports.

[24]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[25]  Adam Hunter,et al.  Yabi: An online research environment for grid, high performance and cloud computing , 2012, Source Code for Biology and Medicine.

[26]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[27]  D. Froese,et al.  Amplicon pyrosequencing late Pleistocene permafrost: the removal of putative contaminant sequences and small‐scale reproducibility , 2013, Molecular ecology resources.

[28]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[29]  James Haile,et al.  DNA-Based Faecal Dietary Analysis: A Comparison of qPCR and High Throughput Sequencing Approaches , 2011, PloS one.

[30]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[31]  C. Wiuf,et al.  Monitoring endangered freshwater biodiversity using environmental DNA. , 2012, Molecular ecology.

[32]  A. Sajantila,et al.  Validation of high throughput sequencing and microbial forensics applications , 2014, Investigative Genetics.

[33]  P. Taberlet,et al.  Fifty Thousand Years of Arctic Vegetation and Megafaunal Diet 1 Reconstruction of Arctic Vegetation from Permafrost Samples 121 , 2022 .

[34]  James Haile,et al.  Deep Sequencing of Plant and Animal DNA Contained within Traditional Chinese Medicines Reveals Legality Issues and Health Safety Concerns , 2012, PLoS genetics.

[35]  Patrick D. Schloss,et al.  Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies , 2011, PloS one.

[36]  E. Guivier,et al.  |SE|S|AM|E| Barcode: NGS‐oriented software for amplicon characterization – application to species and environmental barcoding , 2012, Molecular ecology resources.

[37]  Mehrdad Hajibabaei,et al.  Biomonitoring 2.0: a new paradigm in ecosystem assessment made possible by next‐generation DNA sequencing , 2012, Molecular ecology.

[38]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[39]  L. Orlando,et al.  Meta‐barcoding of ‘dirt’ DNA from soil reflects vertebrate biodiversity , 2012, Molecular ecology.

[40]  P. G. Taylor Reproducibility of ancient DNA sequences from extinct Pleistocene fauna. , 1996, Molecular biology and evolution.

[41]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[42]  Kristine Bohmann,et al.  Molecular Diet Analysis of Two African Free-Tailed Bats (Molossidae) Using High Throughput Sequencing , 2011, PloS one.

[43]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[44]  Dáithí C. Murray,et al.  High-throughput sequencing of ancient plant and mammal DNA preserved in herbivore middens , 2012 .

[45]  Martin Kircher,et al.  High‐throughput DNA sequencing – concepts and limitations , 2010, BioEssays : news and reviews in molecular, cellular and developmental biology.

[46]  R. Ward,et al.  Complete mitochondrial genome sequences of two extinct moas clarify ratite evolution , 2001, Nature.

[47]  R. Knight,et al.  Advancing analytical algorithms and pipelines for billions of microbial sequences. , 2012, Current opinion in biotechnology.

[48]  François Pompanon,et al.  DNA metabarcoding and the cytochrome c oxidase subunit I marker: not a perfect match , 2014, Biology Letters.

[49]  T. Dallman,et al.  Performance comparison of benchtop high-throughput sequencing platforms , 2012, Nature Biotechnology.

[50]  K. Eric Wommack,et al.  Groundtruthing Next-Gen Sequencing for Microbial Ecology–Biases and Errors in Community Structure Estimates from PCR Amplicon Pyrosequencing , 2012, PloS one.

[51]  Robert C. Edgar,et al.  UPARSE: highly accurate OTU sequences from microbial amplicon reads , 2013, Nature Methods.

[52]  K. Robasky,et al.  The role of replicates for error mitigation in next-generation sequencing , 2013, Nature Reviews Genetics.

[53]  J. Archer,et al.  Use of Four Next-Generation Sequencing Platforms to Determine HIV-1 Coreceptor Tropism , 2012, PloS one.

[54]  P. Taberlet,et al.  Who is eating what: diet assessment using next generation sequencing , 2012, Molecular ecology.

[55]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[56]  James F. Meadow,et al.  Significant changes in the skin microbiome mediated by the sport of roller derby , 2013, PeerJ.

[57]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[58]  B. Faircloth,et al.  Not All Sequence Tags Are Created Equal: Designing and Validating Sequence Identification Tags Robust to Indels , 2012, PloS one.

[59]  Modular tagging of amplicons using a single PCR for high‐throughput sequencing , 2014, Molecular ecology resources.

[60]  Tsunglin Liu,et al.  Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly , 2013, PloS one.

[61]  P. Schloss,et al.  Dynamics and associations of microbial community types across the human body , 2014, Nature.

[62]  Antti Sajantila Editors’ Pick: Contamination has always been the issue! , 2014, Investigative Genetics.

[63]  R. Knight,et al.  Forensic identification using skin bacterial communities , 2010, Proceedings of the National Academy of Sciences.

[64]  P. Taberlet,et al.  Replication levels, false presences and the estimation of the presence/absence from eDNA metabarcoding data , 2015, Molecular ecology resources.

[65]  A. Cooper DNA from Museum Specimens , 1994 .

[66]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[67]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[68]  M. Bunce,et al.  Identifying conservation units after large‐scale land clearing: a spatio‐temporal molecular survey of endangered white‐tailed black cockatoos (Calyptorhynchus spp.) , 2014 .

[69]  Elizabeth L Clare,et al.  High-throughput sequencing offers insight into mechanisms of resource partitioning in cryptic bat species , 2011, Ecology and evolution.

[70]  W. E. Harris,et al.  An emergent science on the brink of irrelevance: a review of the past 8 years of DNA barcoding , 2012, Molecular ecology resources.

[71]  Daniel J. Blankenberg,et al.  Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists , 2010, Current protocols in molecular biology.

[72]  James Haile,et al.  Who's for dinner? High‐throughput sequencing reveals bat dietary differentiation in a biodiversity hotspot where prey taxonomy is largely undescribed , 2014, Molecular ecology.

[73]  M. Hofreiter,et al.  Ancient DNA , 2019, Methods in Molecular Biology.

[74]  H. Chu,et al.  High throughput sequencing analysis of biogeographical distribution of bacterial communities in the black soils of northeast China , 2014 .

[75]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[76]  Lounès Chikhi,et al.  A DNA Metabarcoding Study of a Primate Dietary Diversity and Plasticity across Its Entire Fragmented Range , 2013, PloS one.

[77]  Mark J. Clement,et al.  Targeted Amplicon Sequencing (TAS): A Scalable Next-Gen Approach to Multilocus, Multitaxa Phylogenetics , 2011, Genome biology and evolution.

[78]  P. Taberlet,et al.  Towards next‐generation biodiversity assessment using DNA metabarcoding , 2012, Molecular ecology.

[79]  D. G. Wang,et al.  Solid-phase reversible immobilization for the isolation of PCR products. , 1995, Nucleic acids research.

[80]  J. Galindo,et al.  Applications of next generation sequencing in molecular ecology of non-model organisms , 2011, Heredity.

[81]  Thierry Vermat,et al.  Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding , 2006, Nucleic acids research.

[82]  Joakim Lundeberg,et al.  Large Scale Library Generation for High Throughput Sequencing , 2011, PloS one.

[83]  Joakim Lundeberg,et al.  Increased Throughput by Parallelization of Library Preparation for Massive Sequencing , 2010, PloS one.

[84]  Feng Wang,et al.  A long-term field experiment of soil transplantation demonstrating the role of contemporary geographic separation in shaping soil microbial community structure , 2014, Ecology and evolution.

[85]  Robi David Mitra,et al.  Nested Patch PCR enables highly multiplexed mutation discovery in candidate genes. , 2008, Genome research.

[86]  Matthew Mayho,et al.  Evaluation and optimisation of preparative semi‐automated electrophoresis systems for Illumina library preparation , 2012, Electrophoresis.

[87]  N. Dracopoli,et al.  Current protocols in human genetics , 1994 .

[88]  Charlotte L. Oskam,et al.  Quantitative real-time PCR in aDNA research. , 2012, Methods in molecular biology.

[89]  J. Palmer,et al.  Investigating Deep Phylogenetic Relationships among Cyanobacteria and Plastids by Small Subunit rRNA Sequence Analysis 1 , 1999, The Journal of eukaryotic microbiology.

[90]  P. Taberlet,et al.  Using next‐generation sequencing for molecular reconstruction of past Arctic vegetation and climate , 2010, Molecular ecology resources.

[91]  P. Taberlet,et al.  DNA metabarcoding multiplexing and validation of data accuracy for diet assessment: application to omnivorous diet , 2014, Molecular ecology resources.

[92]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[93]  Kabir G. Peay,et al.  Sequence Depth, Not PCR Replication, Improves Ecological Inference from Next Generation DNA Sequencing , 2014, PloS one.

[94]  Jonathan P. Bollback,et al.  The Use of Coded PCR Primers Enables High-Throughput Sequencing of Multiple Homolog Amplification Products by 454 Parallel Sequencing , 2007, PloS one.

[95]  Jesse Dabney,et al.  Length and GC-biases during sequencing library amplification: a comparison of various polymerase-buffer systems with ancient and modern DNA sequencing libraries. , 2012, BioTechniques.

[96]  V. Beneš,et al.  The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. , 2009, Clinical chemistry.