Customized Plug-in Modules in Metascheduler Community Scheduler Framework 4 ( CSF 4 ) for Life Sciences Applications

As more and more life science researchers start to take advantages of grid technologies in their work, the demand increases for a robust yet easy to use metascheduler or resource broker. In this paper, we have extended the metascheduler CSF4 by providing a Virtual Job Model (VJM) to synchronize the resource co-allocation for cross-domain parallel jobs. The VJM eliminates dead-locks and improves resource usage for multi-cluster parallel applications compiled with MPICH-G2. Taking advantage of the extensible scheduler plug-in model of CSF4, one may develop customized metascheduling policies for life sciences applications. As an example, an array-job scheduler plug-in is developed for pleasantly parallel applications such as AutoDock and Blast. The performance of the VJM is evaluated through experiments with mpiBLAST-g2 using a Gfarm data grid testbed. Furthermore, a CSF4 portlet has been released to provide a graphical user interface for transparent grid access, with the use of Gfarm for data staging and simplified data management. The platform is open source at sourceforge.net/projects/gcsf/ and has been deployed in life science gateways by projects such as My WorkSphere, and PRAGMA Biosciences Portal. The VJM enables the development of support for more sophisticated workflows and metascheduling policies in the near future.

[1]  Jorge Luis Rodriguez,et al.  The Open Science Grid , 2005 .

[2]  Klara Nahrstedt,et al.  A distributed resource management architecture that supports advance reservations and co-allocation , 1999, 1999 Seventh International Workshop on Quality of Service. IWQoS'99. (Cat. No.98EX354).

[3]  Putchong Uthayopas,et al.  SCMS: An Integrated Cluster Management Tool for Beowulf Cluster System , 2000, PDPTA.

[4]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[5]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[6]  Gregor von Laszewski,et al.  A Java commodity grid kit , 2001, Concurr. Comput. Pract. Exp..

[7]  Ian T. Foster Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, NPC.

[8]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[9]  D S Goodsell,et al.  Automated docking of flexible ligands: Applications of autodock , 1996, Journal of molecular recognition : JMR.

[10]  Osamu Tatebe,et al.  Gfarm v2: A Grid file system that supports high-performance distributed and parallel data computing , 2005 .

[11]  Xiaohui Wei,et al.  CSF4: A WSRF Compliant Meta-Scheduler , 2006, GCA.

[12]  Ian T. Foster,et al.  SNAP: A Protocol for Negotiating Service Level Agreements and Coordinating Resource Management in Distributed Systems , 2002, JSSPP.

[13]  Kim K. Baldridge,et al.  Opal: SimpleWeb Services Wrappers for Scientific Applications , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[14]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[15]  Sandeep Chandra,et al.  GAMA: grid account management architecture , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[16]  Ian T. Foster,et al.  MPICH-G2: A Grid-enabled implementation of the Message Passing Interface , 2002, J. Parallel Distributed Comput..

[17]  Steven Tuecke,et al.  An online credential repository for the Grid: MyProxy , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[18]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[19]  David Abramson,et al.  High performance parametric modeling with Nimrod/G: killer application for the global grid? , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[20]  Kohei Ichikawa,et al.  Building cyberinfrastructure for bioinformatics using service oriented architecture , 2006 .

[21]  Eduardo Huedo,et al.  A framework for adaptive execution in grids , 2004, Softw. Pract. Exp..

[22]  Nathan A. Baker,et al.  National Biomedical Computation Resource ( NBCR ) : Developing End-to-End Cyberinfrastructure for Multiscale Modeling in Biomedical Research , 2006 .

[23]  Jason Novotny,et al.  GridSphere: a portal framework for building collaborations: Research Articles , 2004 .

[24]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.