FPGA accelerated DNA error correction

Correcting errors in DNA sequencing data is an important process that can improve the quality of downstream analysis using the data. Even though many error-correction methods have been proposed for Illumina reads, their throughput is not high enough to process data from large genomes. The current paper describes the first FPGA-based error-correction tool, called FPGA Accelerated DNA Error Correction (FADE), which targets to improve the throughput of DNA error correction for Illumina reads. The base algorithm of FADE is BLESS that is highly accurate but slow. A Bloom filter that is the main data structure of BLESS and BLESS' error correction subroutines for different types of errors have been implemented on a FPGA. We compared our design with the software version of BLESS using DNA sequencing data generated from four genomes and we could achieve up to 43 times speedup for the best case, and 36 times speedup on the average.

[1]  Peter Sanders,et al.  Cache-, hash-, and space-efficient bloom filters , 2009, JEAL.

[2]  Ben Langmead,et al.  Lighter: fast and memory-efficient error correction without counting , 2014, bioRxiv.

[3]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[4]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[5]  Srinivas Aluru,et al.  A survey of error-correction methods for next-generation sequencing , 2013, Briefings Bioinform..

[6]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[7]  S. Koren,et al.  Accelerating error correction and assembly of single-molecule sequencing reads , 2013 .

[8]  Xiaolong Wu,et al.  BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads , 2014, Bioinform..

[9]  Thomas L. Madden,et al.  Applications of network BLAST server. , 1996, Methods in enzymology.

[10]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Yongchao Liu,et al.  Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data , 2013, Bioinform..

[12]  Lucian Ilie,et al.  Correcting Illumina data , 2015, Briefings Bioinform..

[13]  Ming Liu,et al.  Scalable multi-access flash store for big data analytics , 2014, FPGA.

[14]  Bertil Schmidt,et al.  Reconfigurable Accelerator for the Word-Matching Stage of BLASTN , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  J. Buhler,et al.  Biosequence Similarity Search on the Mercury System , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[16]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.