Construct a grid computing environment for bioinformatics

Internet computing and grid technologies promise to change the way we tackle complex problems. They will enable large-scale aggregation and sharing of computational, data and other resources across institutional boundaries. And harnessing these new technologies effectively will transform scientific disciplines ranging from high-energy physics to the life sciences. The computational analysis of biological sequences is a kind of computation driven science. Cause the biology data growing quickly and these databases are heterogeneous. We can use the grid system sharing and integrating the heterogeneous biology database. As we know, bioinformatics tools can speed up analysis the large-scale sequence data, especially about sequence alignment. The FASTA is a tool for aligning multiple protein or nucleotide sequences. FASTA which we used is a distributed and parallel version. The software uses a message-passing library called MPl (Message Passing Interface) and runs on distributed workstation clusters as well as on traditional parallel computers. A grid computing environment is proposed and constructed on multiple Linux PC clusters by using Globus Toolkit (GT) and SUN Grid Engine (SGE). The experimental results and performances of the bioinformatics tool using on grid system are also presented in this paper.

[1]  Jon B. Weissman,et al.  Applying Grid technologies to bioinformatics , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[2]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[3]  Ian Foster,et al.  The Grid: A New Infrastructure for 21st Century Science , 2002 .

[4]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[5]  L. Smarr,et al.  Metacomputing : Siggraph'92 Showcase , 1992 .

[6]  Thomas E. Royce,et al.  A parallel algorithm for DNA alignment , 2003, CROS.

[7]  Jaap Heringa,et al.  Parallelized multiple alignment , 2002, Bioinform..

[8]  Michael Luck,et al.  On the use of agents in a BioInformatics grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[9]  Denis C. Shields,et al.  Wrapping up BLAST and other applications for use on Unix clusters , 2003, Bioinform..

[10]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[11]  Kuo-Bin Li,et al.  ClustalW-MPI: ClustalW analysis using distributed and parallel computing , 2003, Bioinform..

[12]  Oswaldo Trelles,et al.  On the Parallelization of Bioinformatic Applications , 2001 .

[13]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.