FMSA: FPGA-Accelerated ClustalW-Based Multiple Sequence Alignment through Pipelined Prefiltering

Multiple Sequence Alignment (MSA) is perhaps second only to sequence alignment in overall importance in Bioinformatics, being critical, e.g., in determining the structure and function of molecules from putative families of sequences. But while pair wise sequence alignment has been the subject of scores of FPGA acceleration studies, MSA only a few. The most important of these accelerate Clustal-W, the most commonly used MSA code, by either implementing the first of three phases (over 90% of the run time) with Dynamic Programming (DP) methods, or by accelerating the third phase which consumes most of the remaining time. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 80× to 190× over the CPU code (8 cores) and speedup of from 2.5× to 8× over DP/FPGA- and GPU-based methods. When combined with a recently published method for phase 3, and using the original software for phase 2, the end-to-end speedup is at least 50× over an 8-core implementation of the original code. The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  Bertil Schmidt,et al.  Multiple Sequence Alignment on an FPGA , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[3]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[4]  Quinn Snell,et al.  Accelerated large-scale multiple sequence alignment , 2011, BMC Bioinformatics.

[5]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[6]  Bertil Schmidt,et al.  Accelerating the Viterbi Algorithm for Profile Hidden Markov Models Using Reconfigurable Hardware , 2006, International Conference on Computational Science.

[7]  Scott Hauck,et al.  Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation , 2007 .

[8]  Martin C. Herbordt,et al.  CAAD BLASTP: NCBI BLASTP Accelerated with FPGA-Based Accelerated Pre-Filtering , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[9]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[10]  Martin C. Herbordt,et al.  CAAD BLASTP: NCBI BLASTP Accelerated with FPGA-Based Pre-Filtering , 2009 .

[11]  Weiguo Liu,et al.  GPU-ClustalW: Using Graphics Hardware to Accelerate Multiple Sequence Alignment , 2006, HiPC.

[12]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[13]  Joseph M. Lancaster,et al.  Mercury BLASTP: Accelerating Protein Sequence Alignment , 2008, TRETS.

[14]  Alan D. George,et al.  Novo-G: At the Forefront of Scalable Reconfigurable Supercomputing , 2011, Computing in Science & Engineering.

[15]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[16]  Simon Whelan,et al.  Measuring the distance between multiple sequence alignments , 2012, Bioinform..

[17]  Martin C. Herbordt,et al.  Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefiltering , 2010, ICS '10.

[18]  Pankaj K. Agarwal,et al.  Faster Algorithms for Optimal Multiple Sequence Alignment Based on Pairwise Comparisons , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Martin C. Herbordt,et al.  Single pass streaming BLAST on FPGAs , 2007, Parallel Comput..

[20]  Olivier Poch,et al.  BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations , 2001, Nucleic Acids Res..

[21]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..