Rolling partial prefix-sums to speedup evaluation of uniform and affine recurrence equations

As multithreaded and reconfigurable logic architectures play an increasing role in high-performance computing (HPC), the scientific community is in need for new programming models for efficiently mapping existing applications to the new parallel platforms. In this paper, we show how we can effectively exploit tightly coupled fine-grained parallelism in architectures such as GPU and FPGA to speedup applications described by uniform recurrence equations. We introduce the concept of rolling partial-prefix sums to dynamically keep track of and resolve multiple dependencies without having to evaluate intermediary values. Rolling partial-prefix sums are applicable in low-latency evaluation of dynamic programming problems expressed as uniform or affine equations. To assess our approach, we consider two common problems in computational biology, hidden Markov models (HMMER) for protein motif finding and the Smith-Waterman algorithm. We present a platform independent, linear time solution to HMMER, which is traditionally solved in bilinear time, and a platform independent, sub-linear time solution to Smith-Waterman, which is normally solved in linear time.

[1]  Bertil Schmidt,et al.  Integrating FPGA acceleration into HMMer , 2008, Parallel Comput..

[2]  Brandon Harris,et al.  Accelerator design for protein sequence HMM search , 2006, ICS '06.

[3]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementatio , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[4]  Bashar Qudah,et al.  Accelerating the HMMER sequence analysis suite using conventional processors , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[5]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[6]  Joseph M. Lancaster,et al.  Preliminary results in accelerating profile HMM search on FPGAs , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[7]  Guy E. Blelloch,et al.  Prefix sums and their applications , 1990 .

[8]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementation , 2005, SC.

[9]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[10]  Srinivas Aluru,et al.  Parallel biological sequence comparison using prefix computations , 2003, J. Parallel Distributed Comput..

[11]  Patrice Quinton,et al.  The systematic design of systolic arrays , 1987 .

[12]  Roger D. Chamberlain,et al.  Accelerating HMMER on GPUs by implementing hybrid data and task parallelism , 2010, BCB '10.

[13]  A. Krogh Hidden Markov Models in Computational Biology Applications to Protein Modeling UCSC CRL , 1993 .

[14]  John Paul Walters,et al.  Evaluating the use of GPUs in liver image segmentation and HMMER database searches , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[16]  Richard Hughey,et al.  Parallel hardware for sequence comparison and alignment , 1996, Comput. Appl. Biosci..

[17]  A. Dupret,et al.  Low Power Motion Detection with Low Spatial and Temporal Resolution for CMOS Image Sensor , 2007, 2006 International Workshop on Computer Architecture for Machine Perception and Sensing.

[18]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[19]  Patrice Quinton,et al.  Parallelizing HMMER for Hardware Acceleration on FPGAs , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[20]  Andrzej Wozniak,et al.  Using video-oriented instructions to speed up sequence comparison , 1997, Comput. Appl. Biosci..