论文信息 - An Efficient Approach to Merging Paired-End Reads and Incorporation of Uncertainties

An Efficient Approach to Merging Paired-End Reads and Incorporation of Uncertainties

Next-Generation Sequencing (NGS) technologies have reshaped the landscape of life sciences. The massive amount of data generated by NGS is rapidly transforming biological research from traditional wet-lab work into a data- intensive analytical discipline (Koboldt et al., Cell 155(1):27–38, 2013). The Illumina “sequencing by synthesis” technique (Mardis, Annu Rev Genomics Hum Genet 9:387–402, 2008) is one of the most popular and widely used NGS technologies.

[1] S. B. Needleman,et al. A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2] Richard W. Hamming,et al. Error detecting and error correcting codes , 1950 .

[3] O. Gotoh. An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[4] Konrad H. Paszkiewicz,et al. De novo assembly of short sequence reads , 2010, Briefings Bioinform..

[5] Robert C. Edgar,et al. Error filtering, pair assembly and error correction for next-generation sequencing reads , 2015, Bioinform..

[6] Steven Salzberg,et al. BIOINFORMATICS ORIGINAL PAPER , 2004 .

[7] E. Mardis. Next-generation DNA sequencing methods. , 2008, Annual review of genomics and human genetics.

[8] Jiajie Zhang,et al. PEAR: a fast and accurate Illumina Paired-End reAd mergeR , 2013, Bioinform..

[9] R. Wilson,et al. The Next-Generation Sequencing Revolution and Its Impact on Genomics , 2013, Cell.

[10] Ben Nichols,et al. Distributed under Creative Commons Cc-by 4.0 Vsearch: a Versatile Open Source Tool for Metagenomics , 2022 .

[11] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[12] Steven L Salzberg,et al. Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[13] P Green,et al. Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[14] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[15] Margaret C. Linak,et al. Sequence-specific error profile of Illumina sequencers , 2011, Nucleic acids research.

[16] Torbjørn Rognes,et al. Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[17] Daniel G. Brown,et al. PANDAseq: paired-end assembler for illumina sequences , 2012, BMC Bioinformatics.

[18] S F Altschul,et al. Local alignment statistics. , 1996, Methods in enzymology.

[19] H. Swerdlow,et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.