Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform

An innovative reconfigurable supercomputing platform -- XD1000 is developed by XtremeData Inc. to exploit the rapid progress of FPGA technology and the high-performance of Hyper-Transport interconnection. In this paper, we present the implementations of the Smith-Waterman algorithm for both DNA and protein sequences on the platform. The main features include: (1) we bring forward a multistage PE (processing element) design which significantly reduces the FPGA resource usage and hence allows more parallelism to be exploited; (2) our design features a pipelined control mechanism with uneven stage latencies -- a key to minimize the overall PE pipeline cycle time; (3) we also put forward a compressed substitution matrix storage structure, resulting in substantial decrease of the on-chip SRAM usage. Finally, we implement a 384-PE systolic array running at 66.7MHz, which can achieve 25.6GCUPS peak performance. Compared with the 2.2GHz AMD Opteron host processor, the FPGA coprocessor speedups 185X and 250X respectively.

[1]  H. T. Kung Why systolic architectures? , 1982, Computer.

[2]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[3]  Bertil Schmidt,et al.  High performance biosequence database scanning on reconfigurable platforms , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[4]  Richard Hughey,et al.  Kestrel: A Programmable Array for Sequence Analysis , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[5]  Daniel P. Lopresti,et al.  B-SYS: A 470-Processor Programmable Systolic Array , 1991, ICPP.

[6]  R.K. Singh,et al.  BioSCAN: a VLSI-based system for biosequence analysis , 1991, [1991 Proceedings] IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[7]  Bertil Schmidt,et al.  Hyper customized processors for bio-sequence database scanning on FPGAs , 2005, FPGA '05.

[8]  H. T. Kung,et al.  Systolic Arrays for (VLSI). , 1978 .

[9]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[10]  Dzung T. Hoang,et al.  Searching genetic databases on Splash 2 , 1993, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines.

[11]  Dzung T. Hoang A Systolic Array for the Sequence Alignment Problem , 1992 .

[12]  Richard Hughey,et al.  Kestrel: A Programmable Array for Sequence Analysis , 1998, J. VLSI Signal Process..

[13]  Daniel P. Lopresti,et al.  P-NAC: A Systolic Array for Comparing Nucleic Acid Sequences , 1987, Computer.

[14]  Michael S. Waterman,et al.  Biological information signal processor , 1991, Proceedings of the International Conference on Application Specific Array Processors.