Systematic evaluation of error rates and causes in short samples in next-generation sequencing

Next-generation sequencing (NGS) is the method of choice when large numbers of sequences have to be obtained. While the technique is widely applied, varying error rates have been observed. We analysed millions of reads obtained after sequencing of one single sequence on an Illumina sequencer. According to our analysis, the index-PCR for sample preparation has no effect on the observed error rate, even though PCR is traditionally seen as one of the major contributors to enhanced error rates in NGS. In addition, we observed very persistent pre-phasing effects although the base calling software corrects for these. Removal of shortened sequences abolished these effects and allowed analysis of the actual mutations. The average error rate determined was 0.24 ± 0.06% per base and the percentage of mutated sequences was found to be 6.4 ± 1.24%. Constant regions at the 5′- and 3′-end, e.g., primer binding sites used in in vitro selection procedures seem to have no effect on mutation rates and re-sequencing of samples obtains very reproducible results. As phasing effects and other sequencing problems vary between equipment and individual setups, we recommend evaluation of error rates and types to all NGS-users to improve the quality and analysis of NGS data.

[1]  P. Mieczkowski,et al.  Primer ID Validates Template Sampling Depth and Greatly Reduces the Error Rate of Next-Generation Sequencing of HIV-1 Genomic RNA Populations , 2015, Journal of Virology.

[2]  Sanne Abeln,et al.  NGS-eval: NGS Error analysis and novel sequence VAriant detection tooL , 2015, Nucleic Acids Res..

[3]  Günter Mayer,et al.  Click Reaction on Solid Phase Enables High Fidelity Synthesis of Nucleobase-Modified DNA. , 2016, Bioconjugate chemistry.

[4]  Timothy D. Harris,et al.  The challenges of sequencing by synthesis , 2009, Nature Biotechnology.

[5]  Günter Mayer,et al.  Preparation of SELEX Samples for Next-Generation Sequencing. , 2016, Methods in molecular biology.

[6]  Yun S. Song,et al.  BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing. , 2009, Genome research.

[7]  Dirk Labudde,et al.  Selection of a DNA aptamer against norovirus capsid protein VP1. , 2014, FEMS microbiology letters.

[8]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[9]  Doris Chen,et al.  Monitoring Genomic Sequences during SELEX Using High-Throughput Sequencing: Neutral SELEX , 2010, PloS one.

[10]  Seyed Mohammad Taghdisi,et al.  Systematic evaluation of cell-SELEX enriched aptamers binding to breast cancer cells. , 2017, Biochimie.

[11]  Nicholas C. Wu,et al.  A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing , 2016, BMC Genomics.

[12]  Martin Kircher,et al.  Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform , 2011, Nucleic acids research.

[13]  Lei M. Li,et al.  An adaptive decorrelation method removes Illumina DNA base-calling errors caused by crosstalk between adjacent clusters , 2017, Scientific reports.

[14]  Yue Han,et al.  AfterQC: automatic filtering, trimming, error removing and quality control for fastq data , 2017, BMC Bioinformatics.

[15]  Michael Gundry,et al.  Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants. , 2012, Mutation research.

[16]  Michael Blank,et al.  Next-Generation Analysis of Deep Sequencing Data: Bringing Light into the Black Box of SELEX Experiments. , 2016, Methods in molecular biology.

[17]  Tim Massingham,et al.  All Your Base: a fast and accurate probabilistic approach to base calling , 2012, Genome Biology.

[18]  D. Kwiatkowski,et al.  Optimizing illumina next-generation sequencing library preparation for extremely at-biased genomes , 2012, BMC Genomics.

[19]  Xiaowei Zhan,et al.  QPLOT: A Quality Assessment Tool for Next Generation Sequencing Data , 2013, BioMed research international.

[20]  Jerzy K. Kulski,et al.  Next Generation Sequencing - Advances, Applications and Challenges , 2016 .

[21]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[22]  Joost B. Beltman,et al.  Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells , 2016, BMC Bioinformatics.

[23]  C. Quince,et al.  Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform , 2015, Nucleic acids research.

[24]  J. Vijg,et al.  Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants. , 2012, Mutation research.

[25]  Michael Blank,et al.  Aptamer Selection Technology and Recent Advances , 2015, Molecular therapy. Nucleic acids.

[26]  Peng Jiang,et al.  MapReduce for accurate error correction of next-generation sequencing data , 2017, Bioinform..

[27]  M. Emond,et al.  Accuracy of Next Generation Sequencing Platforms. , 2014, Next generation, sequencing & applications.

[28]  Martin Kircher,et al.  Improved base calling for the Illumina Genome Analyzer using machine learning strategies , 2009, Genome Biology.

[29]  Günter Mayer,et al.  A Versatile Approach Towards Nucleobase-Modified Aptamers. , 2015, Angewandte Chemie.

[30]  Yi-Juan Hu,et al.  PhredEM: a phred‐score‐informed genotype‐calling approach for next‐generation sequencing studies , 2017, Genetic epidemiology.

[31]  Silvio Bicciato,et al.  APTANI: a computational tool to select aptamers through sequence-structure motif analysis of HT-SELEX data , 2015, Bioinform..

[32]  P. Leonard,et al.  Ethynyl side chain hydration during synthesis and workup of "clickable" oligonucleotides: bypassing acetyl group formation by triisopropylsilyl protection. , 2013, The Journal of organic chemistry.

[33]  William H. Thiel,et al.  Analyzing HT-SELEX data with the Galaxy Project tools--A web based bioinformatics platform for biomedical research. , 2016, Methods.

[34]  James O McNamara,et al.  Nucleotide bias observed with a short SELEX RNA aptamer library. , 2011, Nucleic acid therapeutics.

[35]  Jan Hoinka,et al.  AptaPLEX - A dedicated, multithreaded demultiplexer for HT-SELEX data. , 2016, Methods.