Computing large-scale alignments on a multi-cluster

Molecular biologists frequently align DNA sequences of entire genomes to detect important matched and mismatched regions. Even though efficient dynamic programming algorithms exist for this problem, the required computing time is still very high due to the size of these sequences (usually a few million base pairs in length). Because the number of sequenced organisms is increasing rapidly, fast and accurate solutions are of highest importance to research in this area. In this paper we present an algorithm to compute the optimal and near-optimal alignments of two sequences in linear space and quadratic time. We demonstrate how this algorithm can be parallelized efficiently on a PC cluster and on a computational grid in order to reduce its runtime significantly. The grid implementation uses a hierarchical approach combining inter-cluster and intra-cluster parallelism.

[1]  Wei Jie,et al.  RUNNING MPI APPLICATION IN THE HIERARCHICAL GRID ENVIRONMENT , 2002 .

[2]  Guang R. Gao,et al.  Whole Genome Alignment using a Multithreaded Parallel Implementation , 2001, Anais do XIII Simpósio de Arquitetura de Computadores e Processamento de Alto Desempenho (SBAC-PAD 2001).

[3]  Bertil Schmidt,et al.  A hybrid architecture for bioinformatics , 2002, Future Gener. Comput. Syst..

[4]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[5]  Henri E. Bal,et al.  Sensitivity of parallel applications to large differences in bandwidth and latency in two-layer interconnects , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[6]  Srinivas Aluru,et al.  Parallel biological sequence comparison using prefix computations , 2003, J. Parallel Distributed Comput..

[7]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[8]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  A. Apostolio,et al.  A Fast Linear Space Algorithm for Computing Longest Common Subsequences , 1985 .

[11]  W. Miller,et al.  A time-efficient, linear-space local similarity algorithm , 1991 .

[12]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[13]  Kun-Mao Chao,et al.  A local alignment tool for very long DNA sequences , 1995, Comput. Appl. Biosci..

[14]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[15]  Bertil Schmidt,et al.  Massively parallel solutions for molecular sequence analysis , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[16]  George K. Thiruvathukal,et al.  Wide-Area Implementation of the Message Passing Interface , 1998, Parallel Comput..

[17]  Srinivas Aluru,et al.  Parallel biological sequence comparison using prefix computations , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[18]  M. Waterman,et al.  A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. , 1987, Journal of molecular biology.