Review of Genome Sequence Short Read Error Correction Algorithms

Ne xt -generation high throughput sequencing technologies have opened up a wide range of new genome research opportunities. High throughput sequencing technologies produces a massive amount of short reads data in a single run. The large dataset produced by short read sequencing technologies are h ighly error-p rone as compared to tradit ional Sanger sequencing approaches. These errors are critical and removing them is challenging. Therefore, there are peremptory demands for statistical tools for bioinformat ics to analyze such large amounts of data. In this paper, we present review of and measuring parameters associated with genome sequence short read errors correction tools and algorithms. We further present comprehensive detail of datasets and results obtained with defined parameters. The reviews present the current state of the art and future directions in the area as well.

[1]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Haixu Tang,et al.  Fragment assembly with short reads , 2004, Bioinform..

[3]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[4]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[5]  Mark J. P. Chaisson,et al.  Short read fragment assembly of bacterial genomes. , 2008, Genome research.

[6]  Hui Guo,et al.  MapView: visualization of short reads alignment on a desktop computer , 2009, Bioinform..

[7]  M. Schatz,et al.  Assembly of large genomes using second-generation sequencing. , 2010, Genome research.

[8]  Gabor T. Marth,et al.  EagleView: a genome assembly viewer for next-generation sequencing technologies. , 2008, Genome research.

[9]  Lucian Ilie,et al.  SHRiMP2: Sensitive yet Practical Short Read Mapping , 2011, Bioinform..

[10]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[11]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[12]  Wei Xiong,et al.  PSAEC: An Improved Algorithm for Short Read Error Correction Using Partial Suffix Arrays , 2011, FAW-AAIM.

[13]  Paul D. Shaw,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[14]  G. Hon,et al.  Next-generation genomics: an integrative approach , 2010, Nature Reviews Genetics.

[15]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[16]  Leena Salmela,et al.  Correction of sequencing errors in a mixed set of reads , 2010, Bioinform..

[17]  René L. Warren,et al.  Assembling millions of short DNA sequences using SSAKE , 2006, Bioinform..

[18]  C. Ponting,et al.  Genome assembly quality: assessment and improvement using the neutral indel model. , 2010, Genome research.

[19]  Lucian Ilie,et al.  HiTEC: accurate error correction in high-throughput sequencing data , 2011, Bioinform..

[20]  Phil Green,et al.  Whole-genome disassembly , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[22]  Jan Schröder,et al.  Genome analysis SHREC : a short-read error correction method , 2009 .

[23]  Andrew H. Chan,et al.  ECHO: a reference-free short-read error correction algorithm. , 2011, Genome research.

[24]  E. Arner,et al.  Correcting errors in shotgun sequences. , 2003, Nucleic acids research.

[25]  C. Nusbaum,et al.  ALLPATHS: de novo assembly of whole-genome shotgun microreads. , 2008, Genome research.

[26]  Yongchao Liu,et al.  DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI , 2011, BMC Bioinformatics.

[27]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[28]  Gayle M. Wittenberg,et al.  EDAR: An Efficient Error Detection and Removal Algorithm for Next Generation Sequencing Data , 2010, J. Comput. Biol..

[29]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[30]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[31]  Bertil Schmidt,et al.  A fast hybrid short read fragment assembly algorithm , 2009, Bioinform..

[32]  Gianluigi Zanetti,et al.  SEAL: a distributed short read mapping and duplicate removal tool , 2011, Bioinform..

[33]  Weiguo Liu,et al.  A Parallel Algorithm for Error Correction in High-Throughput Short-Read Data on CUDA-Enabled Graphics Hardware , 2010, J. Comput. Biol..

[34]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[35]  Hassan Mathkour,et al.  Genome Sequence Analysis: A Survey , 2009 .

[36]  G. Weinstock,et al.  The Atlas genome assembly system. , 2004, Genome research.

[37]  Juliane C. Dohm,et al.  SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. , 2007, Genome research.

[38]  Jan Schröder,et al.  BIOINFORMATICS ORIGINAL PAPER , 2022 .

[39]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Mark J. P. Chaisson,et al.  De novo fragment assembly with short mate-paired reads: Does the read length matter? , 2009, Genome research.

[41]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[42]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[43]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[44]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[45]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[46]  Vincent J. Magrini,et al.  Extending assembly of short DNA sequences to handle error , 2007, Bioinform..

[47]  Srinivas Aluru,et al.  Reptile: representative tiling for short read error correction , 2010, Bioinform..

[48]  Srinivas Aluru,et al.  Repeat-aware modeling and correction of short read errors , 2011, BMC Bioinformatics.

[49]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[50]  L. Hillier,et al.  PCAP: a whole-genome assembly program. , 2003, Genome research.

[51]  Paul Medvedev,et al.  Error correction of high-throughput sequencing datasets with non-uniform coverage , 2011, Bioinform..

[52]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[53]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .