Constructing workflows from script applications

For programming and executing complex applications on grid infrastructures, scientific workflows have been proposed as convenient high-level alternative to solutions based on general-purpose programming languages, APIs and scripts. GridSpace is a collaborative programming and execution environment, which is based on a scripting approach and it extends Ruby language with a high-level API for invoking operations on remote resources. In this paper we describe a tool which enables to convert the GridSpace application source code into a workflow representation which, in turn, may be used for scheduling, provenance, or visualization. We describe how we addressed the issues of analyzing Ruby source code, resolving variable and method dependencies, as well as building workflow representation. The solutions to these problems have been developed and they were evaluated by testing them on complex grid application workflows such as CyberShake, Epigenomics and Montage. Evaluation is enriched by representing typical workflow control flow patterns.

[1]  Daniel S. Katz,et al.  A comparison of two methods for building astronomical image mosaics on a grid , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[2]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[3]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[4]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[5]  Zhao Zhang,et al.  Parallel Scripting for Applications at the Petascale and Beyond , 2009, Computer.

[6]  Jesús Labarta,et al.  Automatic Grid workflow based on imperative programming languages: Research Articles , 2006 .

[7]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE Concurrency Computat , 2006 .

[8]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[9]  Radu Prodan,et al.  Towards a general model of the multi-criteria workflow scheduling on the grid , 2009, Future Gener. Comput. Syst..

[10]  Marian Bubak,et al.  High-Level Scripting Approach for Building Component-Based Applications on the Grid , 2007, CoreGRID Workshop - Making Grids Work.

[11]  Wil M.P. van der Aalst,et al.  YAWL: yet another workflow language , 2005, Inf. Syst..

[12]  Douglas Thain,et al.  Abstractions for Cloud Computing with Condor , 2009 .

[13]  van der Wmp Wil Aalst,et al.  Workflow control-flow patterns : a revised view , 2006 .

[14]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[15]  Miron Livny,et al.  Distributed computing in practice: the Condor experience: Research Articles , 2005 .

[16]  Bertram Ludäscher,et al.  Scientific workflow management and the Kepler system: Research Articles , 2006 .

[17]  Jun Qin,et al.  Specification of grid workflow applications with AGWL: an Abstract Grid Workflow Language , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[18]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[19]  Li Zhao,et al.  SCEC CyberShake Workflows - Automating Probabilistic Seismic Hazard Analysis Calculations , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[20]  Mei-Hui Su,et al.  Characterization of scientific workflows , 2008, 2008 Third Workshop on Workflows in Support of Large-Scale Science.

[21]  Marian Bubak,et al.  Invocation of operations from script-based Grid applications , 2010, Future Gener. Comput. Syst..

[22]  Marian Bubak,et al.  Collaborative e-Science Experiments and Scientific Workflows , 2011, IEEE Internet Computing.