My WorkSphere: Integrative work environment for grid-unaware biomedical researchers and applications

In order to deliver cyberinfrastructure to the general scientific and biomedical research community, transparent access and ease of use are of critical importance. Applications in systematic modeling of biological processes across scales of time and length demand more and more sophisticated algorithms and larger and longer simulations. The increased level of sophistication requires that cyberinfrastructure developers either work closely with the applications scientists, or develop middleware that flattens the learning curve for these scientists to use the grid willingly and transparently. Many life sciences researchers prefer to run applications in the grid environment without modifications, and without knowledge of specific computational resources being utilized. Here we report the latest advances in the use of Gfarm-FUSE (Grid Data Farm-Filesystem in UserSpaceE) as a computational data grid, with CSF4 (Community Scheduler Framework 4) as the metascheduler, through a GridSphere portal based environment, termed My WorkSphere. We describe the design and performance of this transparent grid computing environment using bioinformatics and computational biology applications as examples. All the components developed or utilized are open source and available freely.

[1]  David S. Goodsell,et al.  Distributed automated docking of flexible ligands to proteins: Parallel applications of AutoDock 2.4 , 1996, J. Comput. Aided Mol. Des..

[2]  Kim K. Baldridge,et al.  Opal: SimpleWeb Services Wrappers for Scientific Applications , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[3]  Steven Tuecke,et al.  An online credential repository for the Grid: MyProxy , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[4]  Gregor von Laszewski,et al.  A Java commodity grid kit , 2001, Concurr. Comput. Pract. Exp..

[5]  Nathan A. Baker,et al.  Electrostatics of nanosystems: Application to microtubules and the ribosome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[7]  Nathan A. Baker,et al.  National Biomedical Computation Resource ( NBCR ) : Developing End-to-End Cyberinfrastructure for Multiscale Modeling in Biomedical Research , 2006 .

[8]  Liang Hu,et al.  Implementing Data Aware Scheduling In Gfarm(R) Using LSF(TM) Scheduler plugin Mechanism , 2005, GCA.

[9]  Peter Arzberger,et al.  PROTEOME ANALYSIS USING IGAP IN GFARM , 2006 .

[10]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[11]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[12]  Peter J. Hunter,et al.  Multiscale modeling: physiome project standards, tools, and databases , 2006, Computer.

[13]  Von Welch,et al.  Security for Virtual Organizations , 2004, The Grid 2, 2nd Edition.

[14]  Kim K. Baldridge,et al.  GEMSTONE: GRID ENABLED MOLECULAR SCIENCE THROUGH ONLINE NETWORKED ENVIRONMENTS , 2006 .

[15]  Liang Hu,et al.  Integrating Local Job Scheduler - LSFTM with GfarmTM , 2005, ISPA.

[16]  Wei Xiao-hui,et al.  Implementing Data Aware Scheduling on Gfarm by Using LSF~(TM) Scheduler Plugin , 2005 .

[17]  Jason Novotny,et al.  GridSphere: a portal framework for building collaborations , 2004, Concurr. Pract. Exp..

[18]  Garrick Staples,et al.  TORQUE resource manager , 2006, SC.

[19]  Xiaohui Wei,et al.  GDIA: A Scalable Grid Infrastructure for Data Intensive Applications , 2006, 2006 International Conference on Hybrid Information Technology.

[20]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[21]  P. Hunter,et al.  Integration from proteins to organs: the Physiome Project , 2003, Nature Reviews Molecular Cell Biology.

[22]  Sandeep Chandra,et al.  GAMA: grid account management architecture , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).