We present an FPGA-based search engine that implements the Smith-Waterman local sequence alignment algorithm to search a stream of input data for patterns up to 38 bytes long. This engine supports the sequence gap, match, and mismatch weightings necessary for genome sequence alignment, and it can perform inexact string matching on generic text. As the engine simply processes input data streaming through it, it can readily operate on data that are unindexed and unsorted, i.e., on unstructured databases. Furthermore, the engine can sustain a search throughput of 100 MB/sec , making it capable of processing input data at the maximum sustained throughput of the hard drive (or array of drives) storing the data. For a performance demonstration we compare the throughput of the engine embedded in a prototype system with the execution times of a direct software implementation of the search engine’s kernel. Although the prototype system limits the input data to 40.5MB/sec, we show how parallelism and pipelining allow this FPGA-based search engine to sustain a search throughput of 100 MB/sec, given a fast enough method for delivering input data, and thereby yield a performance gain over the software implementation of two orders of magnitude.
[1]
M S Waterman,et al.
Identification of common molecular subsequences.
,
1981,
Journal of molecular biology.
[2]
John W. Lockwood,et al.
An Extensible, System-On-Programmable-Chip, Content-Aware Internet Firewall
,
2003,
FPL.
[3]
Mark A. Franklin,et al.
The Mercury System: Exploiting Truly Fast Hardware in Data Mining
,
2003
.
[4]
Jonathan S. Turner,et al.
Design of a gigabit ATM switch
,
1997,
Proceedings of INFOCOM '97.
[5]
Steven A. Guccione,et al.
Gene Matching Using JBits
,
2002,
FPL.
[6]
John W. Lockwood,et al.
Field programmable port extender (FPX) for distributed routing and queuing
,
2000,
FPGA '00.