Space and time optimal parallel sequence alignments

We present the first space and time optimal parallel algorithm for the pairwise sequence alignment problem, a fundamental problem in computational biology. This problem can be solved sequentially in O(mn) time and O(m+n) space, where m and n are the lengths of the sequences to be aligned. The fastest known parallel space-optimal algorithm for pairwise sequence alignment takes optimal O(m+n/p) space, but suboptimal O((m+n)/sup 2//p) time, where p is the number of processors. On the other hand, the most space economical time-optimal parallel algorithm takes O(mn/p) time, but O(m+n/p) space. We close this gap by presenting an algorithm that achieves both time and space optimality, i.e. requires only O((m+n)/p) space and O(mn/p) time. We also present an experimental evaluation of the proposed algorithm on an IBM xSeries cluster. Although presented in the context of full sequence alignments, our algorithm is applicable to other alignment problems in computational biology including local alignments and syntenic alignments. It is also a useful addition to the range of techniques available for parallel dynamic programming.

[1]  Peter S. Pacheco Parallel programming with MPI , 1996 .

[2]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[3]  Robert A. Wagner,et al.  Parallelization of the Dynamic Programming Algorithm for Comparison of Sequences , 1987, International Conference on Parallel Processing.

[4]  Joel H. Saltz,et al.  Parallel processing of biological sequence comparison algorithms , 1988, International Journal of Parallel Programming.

[5]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[6]  Sartaj Sahni,et al.  String Editing on an SIMD Hypercube Multicomputer , 1990, J. Parallel Distributed Comput..

[7]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[8]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[9]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[10]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[12]  Kun-Mao Chao,et al.  A generalized global alignment algorithm , 2003, Bioinform..

[13]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[14]  Webb Miller,et al.  A space-efficient algorithm for local similarities , 1990, Comput. Appl. Biosci..

[15]  Srinivas Aluru,et al.  Parallel Syntenic Alignments , 2002, HiPC.

[16]  Edson Cáceres,et al.  Parallel dynamic programming for solving the string editing problem on a CGM/BSP , 2002, SPAA '02.

[17]  Xiaoqiu Huang,et al.  A space-efficient parallel sequence comparison algorithm for a message-passing multiprocessor , 1990, International Journal of Parallel Programming.

[18]  Srinivas Aluru,et al.  Parallel biological sequence comparison using prefix computations , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[19]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[20]  Mikhail J. Atallah,et al.  Efficient Parallel Algorithms for String Editing and Related Problems , 1990, SIAM J. Comput..