Enabling HMMER for the Grid with COMP Superscalar

Abstract The continuously increasing size of biological sequence databases has motivated the development of analysis suites that, by means of parallelization, are capable of performing faster searches on such databases. However, many of these tools are not suitable for execution on mid-to-large scale parallel infrastructures such as computational Grids. This paper shows how COMP Superscalar can be used to effectively parallelize on the Grid a sequence analysis program. In particular, we present a sequential version of the HMMER hmmpfam tool that, when run with COMP Superscalar, is decomposed into tasks and run on a set of distributed resources, not burdening the programmer with parallelization efforts. Although performance is not a main objective of this work, we also present some test results where COMP Superscalar, using a new pre-scheduling technique, clearly outperforms a well-known parallelization of the hmmpfam algorithm.

[1]  Chittibabu Guda,et al.  SledgeHMMER: a web server for batch searching the Pfam database , 2004, Nucleic Acids Res..

[2]  Brian E. Smith,et al.  An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System , 2008, IEEE Transactions on Parallel and Distributed Systems.

[3]  Rosa M. Badia,et al.  COMP Superscalar: Bringing GRID Superscalar and GCM Together , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[4]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementatio , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[5]  Bashar Qudah,et al.  Accelerating the HMMER sequence analysis suite using conventional processors , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[6]  Denis Caromel,et al.  Towards seamless computing and metacomputing in Java , 1998 .

[7]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[8]  Patricia J. Teller,et al.  Proceedings of the 2008 ACM/IEEE conference on Supercomputing , 2008, HiPC 2008.

[9]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[10]  Rob van Nieuwpoort,et al.  The Grid Application Toolkit: Toward Generic and Easy Application Programming Interfaces for the Grid , 2005, Proceedings of the IEEE.

[11]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[12]  Jesús Labarta,et al.  Programming Grid Applications with GRID Superscalar , 2003, Journal of Grid Computing.

[13]  John Paul Walters,et al.  Improving MPI-HMMER's scalability with parallel I/O , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[14]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementation , 2005, SC.

[15]  Jesús Labarta,et al.  Implementing phylogenetic inference with GRID superscalar , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[16]  Chao-Tung Yang,et al.  G-BLAST: a Grid-based solution for mpiBLAST on computational Grids , 2009 .

[17]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..