DRISEE overestimates errors in metagenomic sequencing data

The extremely high error rates reported by Keegan et al. in ‘A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE’ (PLoS Comput Biol 2012;8:e1002541) for many next-generation sequencing datasets prompted us to re-examine their results. Our analysis reveals that the presence of conserved artificial sequences, e.g. Illumina adapters, and other naturally occurring sequence motifs accounts for most of the reported errors. We conclude that DRISEE reports inflated levels of sequencing error, particularly for Illumina data. Tools offered for evaluating large datasets need scrupulous review before they are implemented.

[1]  Jie Ding,et al.  Estimation of sequencing error rates in short reads , 2012, BMC Bioinformatics.

[2]  Jan Schröder,et al.  Reference-Free Validation of Short Read Data , 2010, PloS one.

[3]  Kan Liu,et al.  BIGpre: A Quality Assessment Package for Next-Generation Sequencing Data , 2011, Genom. Proteom. Bioinform..

[4]  A. Künstner,et al.  ConDeTri - A Content Dependent Read Trimmer for Illumina Data , 2011, PloS one.

[5]  John Boyle,et al.  SAMQA: error classification and validation of high-throughput sequenced read data , 2011, BMC Genomics.

[6]  Susan M. Huse,et al.  Accuracy and quality of massively parallel DNA pyrosequencing , 2007, Genome Biology.

[7]  Srinivas Aluru,et al.  Repeat-aware modeling and correction of short read errors , 2011, BMC Bioinformatics.

[8]  Nicholas A. Bokulich,et al.  Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing , 2012, Nature Methods.

[9]  Weiguo Liu,et al.  A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware , 2010, J. Comput. Biol..

[10]  Srinivas Aluru,et al.  Reptile: representative tiling for short read error correction , 2010, Bioinform..

[11]  Gabor T. Marth,et al.  Pyrobayes: an improved base caller for SNP discovery in pyrosequences , 2008, Nature Methods.

[12]  Leena Salmela,et al.  Correction of sequencing errors in a mixed set of reads , 2010, Bioinform..

[13]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[14]  Paul Medvedev,et al.  Error correction of high-throughput sequencing datasets with non-uniform coverage , 2011, Bioinform..

[15]  Gayle M. Wittenberg,et al.  EDAR: An Efficient Error Detection and Removal Algorithm for Next Generation Sequencing Data , 2010, J. Comput. Biol..

[16]  Jan Schröder,et al.  Genome analysis SHREC : a short-read error correction method , 2009 .

[17]  Andrew H. Chan,et al.  ECHO: a reference-free short-read error correction algorithm. , 2011, Genome research.

[18]  Juliane C. Dohm,et al.  Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems , 2011, Genome Biology.

[19]  Lior Pachter,et al.  RESEARCH ARTICLE Open Access Identification and correction of systematic error in high-throughput sequence data , 2022 .

[20]  S. Morishita,et al.  Efficient frequency-based de novo short-read clustering for error trimming in next-generation sequencing. , 2009, Genome research.

[21]  Susan P. Holmes,et al.  Denoising PCR-amplified metagenome data , 2012, BMC Bioinformatics.

[22]  Andreas Wilke,et al.  A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE , 2012, PLoS Comput. Biol..