Database Allocation Strategies for Parallel BLAST Evaluation on Clusters

In this work we investigate the parallel evaluation of BLAST, the most popular tool for comparing biological sequences. Our goal is to study database distribution issues that, besides workload balancing, could improve the performance of a set of BLAST processes running in a workstation cluster. We consider different partitioning strategies within actual BLAST executions against a few relevant molecular databases. We have implemented multiple databases and input sequence configurations and show that there are many important parameters, such as the fragment generation method and sequence similarities, that must be taken into account in order to make full use of such parallel environment.

[1]  Perry L. Miller,et al.  Comparing machine-independent versus machine-specific parallelization of a software platform for biological sequence comparison , 1992, Comput. Appl. Biosci..

[2]  Emilio L. Zapata,et al.  On an efficient parallelization of exhaustive sequence comparison algorithms on message passing architectures , 1994, Comput. Appl. Biosci..

[3]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Arun Iyengar,et al.  Parallel characteristics of sequence alignment algorithms , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[5]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[6]  Carole A. Goble,et al.  Information Management for Genome Level Bioinformatics , 2001, Very Large Data Bases Conference.

[7]  Ophir Frieder,et al.  Parallel computation in biological sequence analysis , 1998 .

[8]  Richard Hughey,et al.  Parallel hardware for sequence comparison and alignment , 1996, Comput. Appl. Biosci..

[9]  D. Shugar,et al.  Methods in enzymology: Volume 183 molecular evolution: Computer analysis of protein and nucleic acid sequences , 1991 .

[10]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[11]  Carole A. Goble,et al.  Conceptual modelling of genomic information , 2000, Bioinform..

[12]  Thomas L. Casavant,et al.  Three Complementary Approaches to Parallelization of Local BLAST Service on Workstation Clusters (invited paper) , 1999, PaCT.

[13]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[14]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[15]  John V. Carlis,et al.  Efficiency of shared-memory multiprocessors for a genetic sequence similarity search algorithm , 1996 .

[16]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[17]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[18]  Guang R. Gao,et al.  A Multithreaded Parallel Implementation of a Dynamic Programming Algorithm for Sequence Comparison , 2000, Pacific Symposium on Biocomputing.

[19]  Perry L. Miller,et al.  Parallel computation and FASTA: confronting the problem of parallel database search for a fast sequence comparison algorithm , 1991, Comput. Appl. Biosci..

[20]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[21]  Carole A. Goble,et al.  A classification of tasks in bioinformatics , 2001, Bioinform..

[22]  Sérgio Lifschitz,et al.  A Genome Databases Framework , 2001, DEXA.

[23]  R. Doolittle Molecular evolution: computer analysis of protein and nucleic acid sequences. , 1990, Methods in enzymology.

[24]  Guang R. Gao,et al.  Whole Genome Alignment using a Multithreaded Parallel Implementation , 2001, Anais do XIII Simpósio de Arquitetura de Computadores e Processamento de Alto Desempenho (SBAC-PAD 2001).

[25]  Stanley Letovsky,et al.  Bioinformatics: Databases and Systems , 2013, Springer US.

[26]  T. Rognes,et al.  ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. , 2001, Nucleic acids research.

[27]  Malcolm P. Atkinson,et al.  A Database Index to Large Biological Sequences , 2001, VLDB.

[28]  A Jülich Implementations of BLAST for parallel computers. , 1995, Computer applications in the biosciences : CABIOS.

[29]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .