BFC: correcting Illumina sequencing errors

UNLABELLED BFC is a free, fast and easy-to-use sequencing error corrector designed for Illumina short reads. It uses a non-greedy algorithm but still maintains a speed comparable to implementations based on greedy methods. In evaluations on real data, BFC appears to correct more errors with fewer overcorrections in comparison to existing tools. It particularly does well in suppressing systematic sequencing errors, which helps to improve the base accuracy of de novo assemblies. AVAILABILITY AND IMPLEMENTATION https://github.com/lh3/bfc CONTACT hengli@broadinstitute.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Páll Melsted,et al.  Efficient counting of k-mers in DNA sequences using a bloom filter , 2011, BMC Bioinformatics.

[2]  Xiaolong Wu,et al.  BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads , 2014, Bioinform..

[3]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[4]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[5]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[6]  Dominique Lavenier,et al.  GATB: Genome Assembly & Analysis Tool Box , 2014, Bioinform..

[7]  B. Langmead,et al.  Lighter: fast and memory-efficient sequencing error correction without counting , 2014, Genome Biology.

[8]  Heng Li,et al.  Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly , 2012, Bioinform..

[9]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Yongchao Liu,et al.  Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data , 2013, Bioinform..

[11]  Haixu Tang,et al.  Fragment assembly with short reads , 2004, Bioinform..

[12]  R. Durbin,et al.  Efficient de novo assembly of large genomes using compressed data structures. , 2012, Genome research.

[13]  Sebastian Deorowicz,et al.  KMC 2: Fast and resource-frugal k-mer counting , 2014, Bioinform..

[14]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[15]  Peter Sanders,et al.  Cache-, Hash- and Space-Efficient Bloom Filters , 2007, WEA.