Mercury BLASTP: Accelerating Protein Sequence Alignment

Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this article, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11--15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results.

[1]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[2]  Richard Hughey,et al.  Kestrel: A Programmable Array for Sequence Analysis , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[3]  Joseph M. Lancaster,et al.  Acceleration of ungapped extension in Mercury BLAST , 2009, Microprocess. Microsystems.

[4]  Joseph M. Lancaster,et al.  Biosequence similarity search on the Mercury system , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[5]  Alejandro A. Schäffer,et al.  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices , 1999, Bioinform..

[6]  Joseph M. Lancaster,et al.  Mercury BLASTN: Faster DNA Sequence Comparison using a Streaming Hardware Architecture , 2007 .

[7]  Ting Wang,et al.  Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Elon Portugaly,et al.  HMMERHEAD-Accelerating HMM Searches On Large Databases , 2004 .

[9]  Roger D. Chamberlain,et al.  Streaming Data from Disk Store to Application , 2005 .

[10]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[11]  Akihiko Konagaya,et al.  High Speed Homology Search with FPGAs , 2001, Pacific Symposium on Biocomputing.

[12]  Apostolos Dollas,et al.  Some initial results on hardware BLAST acceleration with a reconfigurable architecture , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[13]  Brian E. Smith,et al.  Massively Parallel BLAST for the Blue Gene / L , 2005 .

[14]  Martin C. Herbordt,et al.  Single Pass, BLAST-Like, Approximate String Matching on FPGAs , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[15]  Martin C. Herbordt,et al.  Single pass streaming BLAST on FPGAs , 2007, Parallel Comput..

[16]  Thomas L. Madden,et al.  BLAST: at the core of a powerful and diverse set of sequence analysis tools , 2004, Nucleic Acids Res..

[17]  Dzung T. Hoang,et al.  Searching genetic databases on Splash 2 , 1993, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines.

[18]  Joseph M. Lancaster,et al.  Biosequence Similarity Search on the Mercury System , 2007, J. VLSI Signal Process..

[19]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[20]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[21]  Mark A. Franklin,et al.  The Mercury system: exploiting truly fast hardware for data search , 2003, SNAPI '03.

[22]  Keith D. Underwood,et al.  RC-BLAST: towards a portable, cost-effective open source hardware implementation , 2005, IEEE International Parallel and Distributed Processing Symposium.

[23]  Nagiza F. Samatova,et al.  Efficient data access for parallel BLAST , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[24]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[25]  Dominique Lavenier,et al.  A Reconfigurable Parallel Disk System for Filtering Genomic Banks , 2003, Engineering of Reconfigurable Systems and Algorithms.