Efficient Pairwise Statistical Significance Estimation using FPGAs

In this paper, we present a fast pairwise statistical significance estimator using a Field Programmable Gate Array (FPGA) coprocessor. The running time of the pairwise statistical significance estimation routine is dominated by the hundreds of local alignments it must compute. By offloading the alignment task to an accelerator designed to concurrently process multiple independent alignments, we are able to increase the end-to-end performance of the algorithm by more than 200x over a baseline software implementation. Our proposed accelerator outperforms optimized, multicore software implementations and other FPGA implementations for pairwise statistical significance estimations.

[1]  Bertil Schmidt,et al.  Reconfigurable architectures for bio-sequence database scanning on FPGAs , 2005, IEEE Transactions on Circuits and Systems II: Express Briefs.

[2]  Ankit Agrawal,et al.  Pairwise Statistical Significance of Local Sequence Alignment Using Substitution Matrices with Sequence-Pair-Specific Distance , 2008, 2008 International Conference on Information Technology.

[3]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[4]  Ankit Agrawal,et al.  Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Ankit Agrawal,et al.  Pairwise statistical significance and empirical determination of effective gap opening penalties for protein local sequence alignment , 2008, Int. J. Comput. Biol. Drug Des..

[6]  R. Mott,et al.  Accurate formula for P-values of gapped local sequence and profile alignments. , 2000, Journal of molecular biology.

[7]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[8]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[9]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[10]  Ankit Agrawal,et al.  Sequence-specific sequence comparison using pairwise statistical significance. , 2011, Advances in experimental medicine and biology.

[11]  AgrawalAnkit,et al.  Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices , 2011 .

[12]  Dzung T. Hoang,et al.  Searching genetic databases on Splash 2 , 1993, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines.

[13]  Wayne Luk,et al.  Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications , 1995 .

[14]  Ankit Agrawal,et al.  Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty , 2009, BMC Bioinformatics.

[15]  Philip Heng Wai Leong,et al.  A Smith-Waterman Systolic Cell , 2003, FPL.

[16]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[17]  Guang R. Gao,et al.  Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform , 2007, HPRCTA.

[18]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[19]  Massimo Torquati,et al.  Efficient Smith-Waterman on Multi-core with FastFlow , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[20]  M S Waterman,et al.  Rapid and accurate estimates of statistical significance for sequence data base searches. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[21]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[22]  W. Pearson Empirical statistical estimates for sequence similarity searches. , 1998, Journal of molecular biology.

[23]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.