Grid Deployment Of Bioinformatics Applications: A Case Study In Protein Similarity Determination

In this paper we present a scenario for the grid immersion of the procedures that solve the protein structural similarity determination problem. The emphasis is on the way various computational components and data resources are tied together into a workflow to be executed on a grid. The grid deployment has been organized according to the bag-of-service model: a set of different modules (with their data set) is made available to the application designers. Each module deals with a specific subproblem using a proper protein data representation. At the design level, the process of task selection produces a first general workflow that establishes which subproblems need to be solved and their temporal relations. A further refinement requires to select a procedure for each previously identified task that solves it: the choice is made among different available methods and representations. The final outcome is an instance of the workflow ready for execution on a grid. Our approach to protein structure comparison is based on a combination of indexing and dynamic programming techniques to achieve fast and reliable matching. All the components have been implemented on a grid infrastructure using Globus, and the overall tool has been tested by choosing proteins from different fold classes. The obtained results are compared against SCOP, a standard tool for the classification of known proteins.

[1]  R. Nussinov,et al.  A 3D sequence-independent representation of the protein data bank. , 1995, Protein engineering.

[2]  H. Wolfson,et al.  Detection of non-topological motifs in protein structures. , 1996, Protein engineering.

[3]  R. Abagyan,et al.  An automatic search for similar spatial arrangements of alpha-helices and beta-strands in globular proteins. , 1989, Journal of biomolecular structure & dynamics.

[4]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[5]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[6]  Christian Lemmen,et al.  Computational methods for the structural alignment of molecules , 2000, J. Comput. Aided Mol. Des..

[7]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[8]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[9]  Chris Sander,et al.  The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..

[10]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[11]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[12]  Carlo Ferrari,et al.  A grid-aware approach to protein structure comparison , 2003, J. Parallel Distributed Comput..

[13]  Philip E. Bourne,et al.  Structural Bioinformatics: Bourne/Structural Bioinformatics , 2005 .

[14]  Mario Cannataro,et al.  The knowledge grid , 2003, CACM.