Coupling hundreds of workstations for parallel molecular sequence analysis

We present a highly scalable approach to distributed parallel computing on workstations in the Internet which provides significant speed‐up to molecular biology sequence analysis. Recent developments show that smaller numbers of workstations connected via a local area network can be used efficiently for parallel computing. This work emphasizes scalability with respect to the number of workstations employed. We show that a massively parallel approach using several hundred workstations, dispersed over all continents, can successfully be applied for solving problems with low requirements on communication bandwidth. We calculated the optimal local alignment scores between a single genetic sequence and all sequences of a genetic sequence database using the ssearch code that is well known among molecular biologists. In a heterogeneous network with more than 800 workstations this job terminated after several minutes, in contrast to several days it would have taken on a single machine.

[1]  Vaidy S. Sunderam,et al.  Superconcurrent simulation of polymer chains on heterogeneous networks , 1992, Proceedings Supercomputing '92.

[2]  Nicholas Carriero,et al.  Linda and Friends , 1986, Computer.

[3]  Harold R. Garner,et al.  Supercomputers, Parallel Processing, and Genome Projects , 1994 .

[4]  Peer Bork,et al.  Entschlüsselung von Proteinfunktionen mit Hilfe des Computers: Erkennung und Interpretation entfernter Sequenzähnlichkeiten , 1993, Informatik in den Biowissenschaften.

[5]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[6]  Volker Strumpen,et al.  Efficient Parallel Computing in Distributed Workstation Environments , 1993, Parallel Comput..

[7]  Klaus R. Dittrich,et al.  A federated DBMS-based integrated environment for molecular biology , 1994, Seventh International Working Conference on Scientific and Statistical Database Management.

[8]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[9]  Chris Sander,et al.  The Human Genome and High Performance Computing in Molecular Biology , 1992, Supercomputer.

[10]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[11]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[12]  T. G. Marr,et al.  Computational approaches to discovering semantics in molecular biology , 1989 .

[13]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[14]  Douglas Comer,et al.  Internetworking with TCP/IP , 1988 .

[15]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.