Advancements in genomic sequencing technology is causing genomic database growth to outpace Moore's Law. This continues to make genomic database search a difficult problem and a popular target for emerging processing technologies. The de facto software tool for genomic database search is NCBI BLAST, which operates by transforming each database query into a filter that is subsequently applied to the database. This requires a database scan for every query, fundamentally limiting its performance by I/O bandwidth. In this paper we present a functionally-equivalent variation on the NCBI BLAST algorithm that maps more suitably to an FPGA implementation. This variation of the algorithm attempts to reduce the I/O requirement by leveraging FPGA-specific capabilities, such as high pattern matching throughput and explicit on chip memory structure and allocation. Our algorithm transforms the database -- not the query -- into a filter that is stored as a hierarchical arrangement of three tables, the first two of which are stored on chip and the third off chip. Our results show that -- while performance is data dependent -- it is possible to achieve speedups of up to 8X based on the relative reduction in I/O of our approach versus that of NCBI BLAST. More importantly, the performance relative to NCBI BLAST improves with larger databases and query workload sizes.
[1]
Martin C. Herbordt,et al.
NCBI BLASTP on High-Performance Reconfigurable Computing Systems
,
2015,
TRETS.
[2]
Hong Wang,et al.
A Systolic Array-Based FPGA Parallel Architecture for the BLAST Algorithm
,
2012,
ISRN bioinformatics.
[3]
E. Myers,et al.
Basic local alignment search tool.
,
1990,
Journal of molecular biology.
[4]
Hugh E. Williams,et al.
A Deterministic Finite Automaton for Faster Protein Hit Detection in BLAST
,
2006,
J. Comput. Biol..
[5]
Keith D. Underwood,et al.
RC-BLAST: towards a portable, cost-effective open source hardware implementation
,
2005,
IEEE International Parallel and Distributed Processing Symposium.
[6]
Christus,et al.
A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins
,
2022
.
[7]
M S Waterman,et al.
Identification of common molecular subsequences.
,
1981,
Journal of molecular biology.
[8]
S. B. Needleman,et al.
A general method applicable to the search for similarities in the amino acid sequence of two proteins.
,
1970,
Journal of molecular biology.