On the Parallelization of Bioinformatic Applications

This document surveys the computational strategies followed to parallelize the most used software in the bioinformatics arena. The studied algorithms are computationally expensive and their computational patterns range from regular, such as database searching applications, to very irregularly structured patterns (phylogenetic trees). Fineand coarse-grained parallel strategies are discussed for these very diverse sets of applications. This overview outlines computational issues related to parallelism, physical machine models, parallel programming approaches, and scheduling strategies for a broad range of computer architectures. In particular, it deals with shared, distributed, and shared/distributed memory architectures.

[1]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[2]  Michael Mikolajczak,et al.  Designing And Building Parallel Programs: Concepts And Tools For Parallel Software Engineering , 1997, IEEE Concurrency.

[3]  Joaquín Dopazo,et al.  New phylogenetic venues opened by a novel implementation of the DNAml algorithm , 1998, Bioinform..

[4]  P M Nadkarni,et al.  Fast computation of genetic likelihoods on human pedigree data. , 1992, Human heredity.

[5]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[6]  Miguel A. Andrade-Navarro,et al.  Computational space reduction and parallelization of a new clustering approach for large groups of sequences , 1998, Bioinform..

[7]  William Gropp,et al.  Early experiences with the IBM SP1 and the high-performance switch , 1993 .

[8]  Joaquín Dopazo,et al.  Parallel Implementation of DNAml Program on Message-Passing Architectures , 1998, Parallel Comput..

[9]  C A Johnson,et al.  Parallel computing in biomedical research. , 1994, Science.

[10]  H. Wolfson,et al.  An efficient automated computer vision based technique for detection of three dimensional structural motifs in proteins. , 1992, Journal of biomolecular structure & dynamics.

[11]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Emilio L. Zapata,et al.  Biological sequence analysis on distributed-shared memory multiprocessors , 1998, Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -.

[13]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[14]  Dana S. Richards,et al.  A platform for biological sequence comparison on parallel computers , 1991, Comput. Appl. Biosci..

[15]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[16]  Osamu Gotoh,et al.  Optimal alignment between groups of sequences and its application to multiple sequence alignment , 1993, Comput. Appl. Biosci..

[17]  Andrew S. Tanenbaum,et al.  Structured Computer Organization , 1976 .

[18]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[19]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[20]  Emilio L. Zapata,et al.  On an efficient parallelization of exhaustive sequence comparison algorithms on message passing architectures , 1994, Comput. Appl. Biosci..

[21]  J. F. Collins,et al.  Protein and Nucleic Acid Sequence Database Searching: A Suitable Case for Parallel processing , 1987, Comput. J..

[22]  M. Spence,et al.  Analysis of human genetic linkage , 1986 .

[23]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[24]  Emilio L. Zapata,et al.  Mapping Strategies for Sequential Sequence Comparison Algorithms on LAN-Based Message Passing Architectures , 1994, HPCN.

[25]  Robert A. Wagner,et al.  Parallelization of the Dynamic Programming Algorithm for Comparison of Sequences , 1987, International Conference on Parallel Processing.

[26]  Alois Goller,et al.  Parallel and Distributed Processing , 1998, Lecture Notes in Computer Science.

[27]  R. Jones Sequence pattern matching on a massively parallel computer , 1992, Comput. Appl. Biosci..

[28]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[29]  W R Taylor,et al.  Fast structure alignment for protein databank searching , 1992, Proteins.

[30]  A. Lapedes,et al.  Timing the ancestor of the HIV-1 pandemic strains. , 2000, Science.

[31]  A A Schäffer,et al.  Integrating parallelization strategies for linkage analysis. , 1995, Computers and biomedical research, an international journal.

[32]  J. Felsenstein Maximum-likelihood estimation of evolutionary trees from continuous characters. , 1973, American journal of human genetics.

[33]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[34]  Rajendra Kulkarni,et al.  Multiple alignment of sequences on parallel computers , 1993, Comput. Appl. Biosci..

[35]  C. Sander,et al.  Searching protein structure databases has come of age , 1994, Proteins.

[36]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[37]  Webb Miller Building multiple alignments from pairwise alignments , 1993, Comput. Appl. Biosci..

[38]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[39]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.