Parallel Linear Space Algorithm for Large-Scale Sequence Alignment

Aligning long DNA sequences is a fundamental and common task in molecular biology. Though dynamic programming algorithms have been developed to solve this problem, the space and time required by these algorithms are still a challenge. In this paper we present the Parallel Linear Space Alignment (PLSA) algorithm to compute the long sequence alignment to meet this challenge. Using this algorithm, the local start points and grid cache partition the whole sequence alignment problem into several smaller independent subproblems. A novel dynamic load balancing approach then efficiently solves these subproblems in parallel, which provides more parallelism in the trace-back phase. Furthermore, PLSA helps to find k near-optimal non-intersecting alignments. Our experiments show that this proposed algorithm scales well with the increasing number of processors, and it exhibits almost linear speedup for large-scale sequences.

[1]  A. Apostolio,et al.  A Fast Linear Space Algorithm for Computing Longest Common Subsequences , 1985 .

[2]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  Bertil Schmidt,et al.  Computing large-scale alignments on a multi-cluster , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[5]  Srinivas Aluru,et al.  Parallel biological sequence comparison using prefix computations , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[6]  Jonathan Schaeffer,et al.  FastLSA: a fast, linear-space, parallel and sequential algorithm for sequence alignment , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[7]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[8]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[9]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[10]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[11]  D. B. Davis,et al.  Intel Corp. , 1993 .