Porting Biological Applications in Grid: An Experience within the EUChinaGRID Framework

The number of natural protein sequences is infinitely small as compared to the number of proteins theoretically possible. Thus, a huge number of protein sequences, defined as “never born proteins” or NBPs, have never been observed in nature. The study of the structural and functional properties of NBPs represents a way to improve our knowledge on the fundamental properties that make existing protein sequences so unique. Protein structure prediction tools combined with the use of large computing resources allow to tackle this problem. The study of NBPs requires the generation of a large library of non-natural protein sequences (105–107) and the prediction of their three-dimensional structure. On a single CPU it would require years to predict the structure of such a library of protein sequences. However, this is an embarrassingly parallel problem in which the same computation must be repeated several times and the use of grid infrastructures makes feasible to approach this problem in an acceptable time frame. Here we describe the set up of a simulation environment within the EUChinaGRID[1] infrastructure that allows non expert users to exploit grid resources for large-scale proteins structure prediction.

[1]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[2]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[3]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[4]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[5]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..