HiTEC: accurate error correction in high-throughput sequencing data

MOTIVATION High-throughput sequencing technologies produce very large amounts of data and sequencing errors constitute one of the major problems in analyzing such data. Current algorithms for correcting these errors are not very accurate and do not automatically adapt to the given data. RESULTS We present HiTEC, an algorithm that provides a highly accurate, robust and fully automated method to correct reads produced by high-throughput sequencing methods. Our approach provides significantly higher accuracy than previous methods. It is time and space efficient and works very well for all read lengths, genome sizes and coverage levels. AVAILABILITY The source code of HiTEC is freely available at www.csd.uwo.ca/~ilie/HiTEC/.

[1]  Gene Myers,et al.  Building Fragment Assembly String Graphs , 2005 .

[2]  Ting Chen,et al.  PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds , 2009, Bioinform..

[3]  Giorgio Valle,et al.  PASS: a program to align short sequences , 2009, Bioinform..

[4]  Meng Zhang,et al.  The next-generation sequencing technology: A technology review and future perspective , 2010, Science China Life Sciences.

[5]  David Hernández,et al.  De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. , 2008, Genome research.

[6]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[7]  Michael Brudno,et al.  SHRiMP: Accurate Mapping of Short Color-space Reads , 2009, PLoS Comput. Biol..

[8]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[9]  Bin Ma,et al.  ZOOM! Zillions of oligos mapped , 2008, Bioinform..

[10]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[11]  Vincent J. Magrini,et al.  Extending assembly of short DNA sequences to handle error , 2007, Bioinform..

[12]  Eugene W. Myers,et al.  The fragment assembly string graph , 2005, ECCB/JBI.

[13]  C. Nusbaum,et al.  ALLPATHS: de novo assembly of whole-genome shotgun microreads. , 2008, Genome research.

[14]  Leena Salmela,et al.  Correction of sequencing errors in a mixed set of reads , 2010, Bioinform..

[15]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[16]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..

[17]  Dong Kyue Kim,et al.  Constructing suffix arrays in linear time , 2005, J. Discrete Algorithms.

[18]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2005, J. Discrete Algorithms.

[19]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[20]  Jignesh M. Patel,et al.  ProbeMatch: rapid alignment of oligonucleotides to genome allowing both gaps and mismatches , 2009, Bioinform..

[21]  Srinivas Aluru,et al.  Reptile: representative tiling for short read error correction , 2010, Bioinform..

[22]  Steven J. M. Jones,et al.  Slider—maximum use of probability information for alignment of short sequence reads and SNP detection , 2008, Bioinform..

[23]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Mark J. P. Chaisson,et al.  De novo fragment assembly with short mate-paired reads: Does the read length matter? , 2009, Genome research.

[25]  René L. Warren,et al.  Assembling millions of short DNA sequences using SSAKE , 2006, Bioinform..

[26]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[27]  R. Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[28]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prex Computation in Sux Arrays and Its Applications , 2001 .

[29]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[30]  Weiguo Liu,et al.  A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware , 2010, J. Comput. Biol..

[31]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[32]  Yuan Gao,et al.  MOM: maximum oligonucleotide mapping , 2009, Bioinform..

[33]  Michael Q. Zhang,et al.  Using quality scores and longer reads improves accuracy of Solexa read mapping , 2008, BMC Bioinformatics.

[34]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[35]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[36]  Juliane C. Dohm,et al.  SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. , 2007, Genome research.

[37]  Jan Schröder,et al.  Genome analysis SHREC : a short-read error correction method , 2009 .

[38]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.