Multidimensional Dynamic Programming for Homology Search on Distributed Systems

Alignment problems in computational biology have been focused recently because of the rapid growth of sequence databases. By computing alignment, we can understand similarity among the sequences. Dynamic programming is a technique to find optimal alignment, but it requires very long computation time. We have shown that dynamic programming for more than two sequences can be efficiently processed on a compact system which consists of an off-the-shelf FPGA board and its host computer (node). The performance is, however, not enough for comparing long sequences. In this paper, we describe a computation method for the multidimensional dynamic programming on distributed systems. The method is now being tested using two nodes connected by Ethernet. According to our experiments, it is possible to achieve 5.1 times speedup with 16 nodes, and more speedup can be expected for comparing longer sequences using more number of nodes. The performance is affected only a little by the data transfer delay when comparing long sequences. Therefore, our method can be mapped on any kinds of networks with large delays.

[1]  Dominique Lavenier SAMBA : Systolic Accelerator for Molecular Biological Applications , 1996 .

[2]  Akihiko Konagaya,et al.  High Speed Homology Search Using Run-Time Reconfiguration , 2002, FPL.

[3]  Akihiko Konagaya,et al.  Multidimensional dynamic programming for homology search , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[4]  biyofizik Biyokimya,et al.  The European Molecular Biology Laboratory , 2011, Current Biology.

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  Peter M. Athanas,et al.  A run-time reconfigurable system for gene-sequence searching , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[7]  Steven A. Guccione,et al.  Gene Matching Using JBits , 2002, FPL.

[8]  R.K. Singh,et al.  BioSCAN: a VLSI-based system for biosequence analysis , 1991, [1991 Proceedings] IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[9]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[10]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[11]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.