Biosequence similarity search on the Mercury system

Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described.

[1]  Qiong Zhang,et al.  Massively parallel data mining using reconfigurable hardware: approximate string matching , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[2]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[3]  Akihiko Konagaya,et al.  High Speed Homology Search with FPGAs , 2001, Pacific Symposium on Biocomputing.

[4]  Richard Hughey,et al.  Kestrel: A Programmable Array for Sequence Analysis , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[5]  Christos Faloutsos,et al.  Active Disks for Large-Scale Data Processing , 2001, Computer.

[6]  Mark A. Franklin,et al.  The Mercury system: exploiting truly fast hardware for data search , 2003, SNAPI '03.

[7]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[8]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[9]  Jung Ho Ahn,et al.  Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[10]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[11]  Dominique Lavenier,et al.  A Reconfigurable Parallel Disk System for Filtering Genomic Banks , 2003, Engineering of Reconfigurable Systems and Algorithms.

[12]  Dzung T. Hoang,et al.  Searching genetic databases on Splash 2 , 1993, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines.

[13]  Pavel A. Pevzner,et al.  Multiple filtration and approximate pattern matching , 1995, Algorithmica.

[14]  Qiong Zhang,et al.  An FPGA-based Search Engine for Unstructured Database , 2003 .

[15]  Mark A. Franklin,et al.  An architecture for fast processing of large unstructured data sets , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..