FPGA-accelerated seed generation in Mercury BLASTP

BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more runtime or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this paper, we focus on seed generation, the first stage of the BLASTP algorithm. Our seed generator is capable of processing database residues at up to 219 Mresidues/second for 2048- residue queries. The full Mercury BLASTP pipeline, including our seed generator, achieves a speedup of 37times over the popular NCBI BLASTP software on a 2.8 GHz Intel P4 CPU, with sensitivity more than 99% that of the software. Our architecture can be generalized to accelerate the seed generation stage in other important biocomputing applications.

[1]  N. S. Barnett,et al.  Private communication , 1969 .

[2]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[3]  Dzung T. Hoang,et al.  Searching genetic databases on Splash 2 , 1993, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines.

[4]  Richard Hughey,et al.  Kestrel: A Programmable Array for Sequence Analysis , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[5]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[6]  Akihiko Konagaya,et al.  High Speed Homology Search with FPGAs , 2001, Pacific Symposium on Biocomputing.

[7]  Dominique Lavenier,et al.  A Reconfigurable Parallel Disk System for Filtering Genomic Banks , 2003, Engineering of Reconfigurable Systems and Algorithms.

[8]  Mark A. Franklin,et al.  The Mercury system: exploiting truly fast hardware for data search , 2003, SNAPI '03.

[9]  Thomas L. Madden,et al.  BLAST: at the core of a powerful and diverse set of sequence analysis tools , 2004, Nucleic Acids Res..

[10]  J. Buhler,et al.  Biosequence Similarity Search on the Mercury System , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[11]  HMMERHEAD-Accelerating HMM Searches On Large Databases , 2004 .

[12]  Ting Wang,et al.  Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Brian E. Smith,et al.  Massively Parallel BLAST for the Blue Gene / L , 2005 .

[14]  Nagiza F. Samatova,et al.  Efficient data access for parallel BLAST , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[15]  Keith D. Underwood,et al.  RC-BLAST: towards a portable, cost-effective open source hardware implementation , 2005, IEEE International Parallel and Distributed Processing Symposium.

[16]  Apostolos Dollas,et al.  Some initial results on hardware BLAST acceleration with a reconfigurable architecture , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[17]  Martin C. Herbordt,et al.  Single Pass, BLAST-Like, Approximate String Matching on FPGAs , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[18]  Brandon Harris Acceleration of Gapped Alignment in BLASTP Using the Mercury System , 2006 .

[19]  Joseph M. Lancaster,et al.  Acceleration of ungapped extension in Mercury BLAST , 2009, Microprocess. Microsystems.