Hyper customized processors for bio-sequence database scanning on FPGAs

Protein sequences with unknown functionality are often compared to a set of known sequences to detect functional similarities. Efficient dynamic-programming algorithms exist for solving this problem, however current solutions still require significant scan times. These scan time requirements are likely to become even more severe due to exponential database growth. In this paper we present a new approach to bio-sequence database scanning using re-configurable FPGA-based hardware platforms to gain high performance at low cost. Efficient mappings of the Smith-Waterman algorithm using fine-grained parallel processing elements (PEs) that are tailored towards the parameters of a query have been designed. We use customization opportunities available at run-time to dynamically hyper customize the systolic array to make better use of available resource. Our FPGA implementation achieves a speedup of approximately 170 for linear gap penalties and 125 for affine gap penalties as compared to a standard desktop computing platform. We show how hyper-customization at run-time can be used to further improve the performance.

[1]  I. Xilinx,et al.  Virtex-II Platform FPGA User Guide , 2002 .

[2]  Mike Peattie Two Flows for Partial Reconfiguration: Module Based or Small Bit Manipulations , 2000 .

[3]  Mary Jane Irwin,et al.  A SIMD solution to the sequence comparison problem on the MGAP , 1994, Proceedings of IEEE International Conference on Application Specific Array Processors (ASSAP'94).

[4]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[5]  Dzung T. Hoang,et al.  Searching genetic databases on Splash 2 , 1993, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines.

[6]  Jean-Jacques Codani,et al.  LASSAP, a LArge Scale Sequence compArison Package , 1997, Comput. Appl. Biosci..

[7]  Philip Heng Wai Leong,et al.  A Smith-Waterman Systolic Cell , 2003, FPL.

[8]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[9]  Michael S. Waterman,et al.  Biological information signal processor , 1991, Proceedings of the International Conference on Application Specific Array Processors.

[10]  Stephen G. Tell,et al.  BioSCAN: a network sharable computational resource for searching biosequence databases , 1996, Comput. Appl. Biosci..

[11]  Eugene W. Myers,et al.  Basic local alignment search tool. Journal of Molecular Biology , 1990 .

[12]  Dominique Lavenier,et al.  Parallel Processing for Scanning Genomic Data-Bases , 1997, PARCO.

[13]  Bertil Schmidt,et al.  Massively parallel solutions for molecular sequence analysis , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[14]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[15]  Maya Gokhale,et al.  Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[16]  Richard Hughey,et al.  Parallel hardware for sequence comparison and alignment , 1996, Comput. Appl. Biosci..

[17]  Eric Rice,et al.  The UCSC Kestrel General Purpose Parallel Processor , 1999, PDPTA.

[18]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[19]  Steven A. Guccione,et al.  Run-time parameterizable cores , 1999, FPGA '99.

[20]  Daniel P. Lopresti,et al.  P-NAC: A Systolic Array for Comparing Nucleic Acid Sequences , 1987, Computer.

[21]  Akihiko Konagaya,et al.  High Speed Homology Search with FPGAs , 2001, Pacific Symposium on Biocomputing.

[22]  Dominique Lavenier,et al.  SAMBA: hardware accelerator for biological sequence comparison , 1997, Comput. Appl. Biosci..

[23]  Steven A. Guccione,et al.  Gene Matching Using JBits , 2002, FPL.