A Framework for the Design and Reuse of Grid Workflows

Grid workflows can be seen as special scientific workflows involving high performance and/or high throughput computational tasks. Much work in grid workflows has focused on improving application performance through schedulers that optimize the use of computational resources and bandwidth. As high-end computing resources are becoming more of a commodity that is available to new scientific communities, there is an increasing need to also improve the design and reusability “performance” of scientific workflow systems. To this end, we are developing a framework that supports the design and reuse of grid workflows. Individual workflow components (e.g., for data movement, database querying, job scheduling, remote execution etc.) are abstracted into a set of generic, reusable tasks. Instantiations of these common tasks can be functionally equivalent atomic components (called actors) or composite components (so-called composite actors or subworkflows). In this way, a grid workflow designer does not have to commit to a particular Grid technology when developing a scientific workflow; instead different technologies (e.g. GridFTP, SRB, and scp) can be used interchangeably and in concert. We illustrate the application of our framework using two real-world Grid workflows from different scientific domains, i.e., cheminformatics and bioinformatics, respectively.

[1]  Gary J. Balas,et al.  Software-enabled control : information technology for dynamical systems , 2005 .

[2]  Kim K. Baldridge,et al.  The Computational Chemistry Prototyping Environment , 2005, Proceedings of the IEEE.

[3]  Ian Foster,et al.  The Globus toolkit , 1998 .

[4]  Bertram Ludäscher,et al.  An Ontology-Driven Framework for Data Transformation in Scientific Workflows , 2004, DILS.

[5]  Oscar H. Ibarra,et al.  Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors , 1977, JACM.

[6]  Mark S. Gordon,et al.  General atomic and molecular electronic structure system , 1993, J. Comput. Chem..

[7]  Francine Berman,et al.  Overview of the Book: Grid Computing – Making the Global Infrastructure a Reality , 2003 .

[8]  Wil M. P. van der Aalst,et al.  Advanced Workflow Patterns , 2000, CoopIS.

[9]  Francine Berman,et al.  Adaptive Computing on the Grid Using AppLeS , 2003, IEEE Trans. Parallel Distributed Syst..

[10]  Kaizar Amin,et al.  GridAnt: a client-controllable grid workflow system , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[11]  David Abramson,et al.  Application of grid computing to parameter sweeps and optimizations in molecular modeling , 2005, Future Gener. Comput. Syst..

[12]  LudäscherBertram,et al.  Scientific workflow management and the Kepler system , 2006 .

[13]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[14]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[15]  Bertram Ludäscher,et al.  A Web service composition and deployment framework for scientific workflows , 2004, Proceedings. IEEE International Conference on Web Services, 2004..

[16]  Edward A. Lee,et al.  Heterogeneous Modeling and Design of Control Systems , 2003 .

[17]  Edward A. Lee,et al.  Dataflow process networks , 1995, Proc. IEEE.

[18]  Kim K. Baldridge,et al.  Cluster and Grid Infrastructure for Computational Chemistry and Biochemistry , 2005 .

[19]  David Abramson,et al.  High performance parametric modeling with Nimrod/G: killer application for the global grid? , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.