Interoperability of heterogeneous large-scale scientific workflows and data resources

Workflow allows e-Scientists to express their experimental processes in a structured way and provides a glue to integrate remote applications. Since Grid provides an enormously large amount of data and computational resources, executing workflows on the Grid results in significant performance improvement. Several workflow management systems, which are widely used by different scientific communities, were developed for various purposes. Therefore, they differ in several aspects. This thesis outlines two major problems of existing workflow systems: workflow interoperability and data access. On the one hand, existing workflow systems are based on different technologies. Therefore, to achieve interoperability between their workflows at any level is a challenging task. In spite of the fact that there is a clear demand for interoperable workflows, for example, to enable scientists to share workflows, to leverage existing work of others, and to create multi-disciplinary workflows; currently, there are only limited, ad-hoc workflow interoperability solutions available for scientists. Existing solutions only realise workflow interoperability between a small set of workflow systems and do not consider performance issues that arise in the case of large-scale (computational and/or data intensive) scientific workflows. Scientific workflows are typically computation and/or data intensive and are executed in a distributed environment to speed up their execution time. Therefore, their performance is a key issue. Existing interoperability solutions bottleneck the communication between workflows in most scenarios dramatically increasing execution time. On the other hand, many scientific computational experiments are based on data that reside in data resources which can be of different types and vendors. Many workflow systems support access to limited subsets of such data resources preventing data level workflow interoperation between different systems. Therefore, there is a demand for a general solution that provides access to a wide range of data resources of different types and vendors. If such a solution is general, in the sense that it can be adopted by several workflow systems, then it also enables workflows of different systems to access the same data resources and therefore interoperate at data level. Note that data semantics are out of the scope of this work. For the same reasons as described above, the performance characteristics of such a solution are inevitably important. Although in terms of functionality, there are solutions which could be adopted by workflow systems for this purpose, they provide poor performance. For that reason, they did not gain wide acceptance by the scientific workflow community. Addressing these issues, a set of architectures is proposed to realise heterogeneous data access and heterogeneous workflow execution solutions. The primary goal was to investigate how such solutions can be implemented and integrated with workflow systems. The secondary aim was to analyse how such solutions can be implemented and utilised by single applications.

[1]  David De Roure,et al.  Experiences with GRIA - Industrial Applications on a Web Services Grid , 2005, e-Science.

[2]  Rajkumar Buyya,et al.  A taxonomy and survey of grid resource management systems for distributed computing , 2002, Softw. Pract. Exp..

[3]  Edward A. Lee,et al.  Heterogeneous Concurrent Modeling and Design R S I T Y @bullet O F @bullet C Overview of the Ptolemy Project , 2003 .

[4]  Dharmesh Chohan,et al.  A Web Portal for the National Grid Service , 2005 .

[5]  Ian J. Taylor,et al.  Triana: a graphical Web service composition and execution toolkit , 2004, Proceedings. IEEE International Conference on Web Services, 2004..

[6]  C. A. Petri Communication with automata , 1966 .

[7]  Radu Prodan,et al.  ASKALON: a tool set for cluster and Grid computing , 2005, Concurr. Pract. Exp..

[8]  Tamas Kiss,et al.  Integrating Open Grid Services Architecture Data Access and Integration with computational Grid workflows , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[9]  Daniel A. Reed,et al.  Grids, the TeraGrid, and Beyond , 2003, Computer.

[10]  Luis Felipe Cabrera Web Services Eventing (WS-Eventing) , 2004 .

[11]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .

[12]  Vasa Curcin,et al.  Heterogeneous Workflows in Scientific Workflow Systems , 2007, International Conference on Computational Science.

[13]  Sandro Fiore,et al.  GRelC DAS: A Grid-DB Access Service for gLite Based Production Grids , 2007, 16th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE 2007).

[14]  Michael Boniface,et al.  Workflow Interoperability in Grid-based Systems , 2006 .

[15]  Gábor Terstyánszky,et al.  Workflow Level Interoperation of Grid Data Resources , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[16]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[17]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[18]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[19]  D. Hollingsworth The workflow Reference Model , 1994 .

[20]  Maria Mirto,et al.  A Split & Merge Data Management Architecture for a Grid Environment , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[21]  Andrew Yang,et al.  Developing integrated Web and database applications using JAVA applets and JDBC drivers , 1998, SIGCSE '98.

[22]  Norman W. Paton,et al.  OGSA-DAI Usage Scenarios and Behaviour – Determining Good Practice , 2004 .

[23]  Tamas Kiss,et al.  Towards Grid data interoperation: OGSA-DAI data resources in computational Grid workflows , 2008 .

[24]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[25]  Carole A. Goble,et al.  The Evolution of myExperiment , 2010, 2010 IEEE Sixth International Conference on e-Science.

[26]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[27]  Tamas Kiss,et al.  High-level user interface for accessing database resources on the Grid , 2008 .

[28]  A. D. Meglio,et al.  Programming the Grid with gLite , 2006 .

[29]  Péter Kacsuk,et al.  P-GRADE: A Grid Programming Environment , 2003, Journal of Grid Computing.

[30]  MacKenzie Smith,et al.  DSpace: An Open Source Dynamic Digital Repository , 2003, D Lib Mag..

[31]  Steven Tuecke,et al.  GridFTP: Protocol Extensions to FTP for the Grid , 2001 .

[32]  Ian J. Taylor,et al.  Publish/subscribe as a model for scientific workflow interoperability , 2009, WORKS '09.

[33]  Erwin Laure,et al.  Middleware for the next generation Grid infrastructure , 2004 .

[34]  Shantenu Jha,et al.  Grid Interoperability at the Application Level Using SAGA , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[35]  Achim Streit,et al.  Web Services Interfaces and Open Standards Integration into the European UNICORE 6 Grid Middleware , 2007, 2007 Eleventh International IEEE EDOC Conference Workshop.

[36]  Nuno Santos,et al.  The AMGA Metadata Service , 2008, Journal of Grid Computing.

[37]  Maria Mirto,et al.  The GRelC library: a basic pillar in the grid relational catalog architecture , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[38]  Mark Baker,et al.  Integration of Existing Grid Tools in Sakai VRE , 2006, 2006 Fifth International Conference on Grid and Cooperative Computing Workshops.

[39]  Thierry Delaitre,et al.  GEMLCA: grid execution management for legacy code architecture design , 2004 .

[40]  Ian Taylor,et al.  Resource management for the Triana peer-to-peer services , 2004 .

[41]  Carole A. Goble,et al.  Seven Bottlenecks to Workflow Reuse and Repurposing , 2005, International Semantic Web Conference.

[42]  Tuomas Sandholm,et al.  Globus Toolkit 3 Core-A Grid Service Container Framework , 2003 .

[43]  Ian J. Taylor,et al.  Scientific workflow interoperability framework , 2010, Int. J. Bus. Process. Integr. Manag..

[44]  Cees T. A. M. de Laat,et al.  Using Jade agent framework to prototype an e-Science workflow bus , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[45]  Ralph Niederberger,et al.  The DEISA Project - Network Operation and Support - First Experiences , 2005, TNC.

[46]  Ian T. Foster Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, NPC.

[47]  Cees T. A. M. de Laat,et al.  Distributed execution of aggregated multi domain workflows using an agent framework , 2007, 2007 IEEE Congress on Services (Services 2007).

[48]  Rajkumar Buyya,et al.  Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies , 2004, ArXiv.

[49]  Deepti Kodeboyina,et al.  Experiences with OGSA-DAI: Portlet Access and Benchmark , 2003 .

[50]  P. Kacsuk,et al.  WS-PGRADE: Supporting parameter sweep applications in workflows , 2008, 2008 Third Workshop on Workflows in Support of Large-Scale Science.

[51]  Sandra Payette,et al.  Flexible and Extensible Digital Object and Repository Architecture (FEDORA) , 1998, ECDL.

[52]  Francisco Curbera,et al.  Web Services Business Process Execution Language Version 2.0 , 2007 .

[53]  Antonia Ghiselli,et al.  A Practical Approach for a Workflow Management System , 2008 .

[54]  Johan Montagnat,et al.  Flexible and Efficient Workflow Deployment of Data-Intensive Applications On Grids With MOTEUR , 2008, Int. J. High Perform. Comput. Appl..

[55]  J. Baud,et al.  LCG Data Management : From EDG to EGEE , 2005 .

[56]  Neil P. Chue Hong,et al.  OGSA-DAI 3.0 – The Whats and the Whys , 2007 .

[57]  Robin Milner,et al.  Communicating and mobile systems - the Pi-calculus , 1999 .

[58]  Norman W. Paton,et al.  The WS-DAI family of specifications for web service data access and integration , 2006, SGMD.

[59]  Péter Kacsuk,et al.  Solving the grid interoperability problem by P-GRADE portal at workflow level , 2008, Future Gener. Comput. Syst..

[60]  Carole A. Goble,et al.  Delivering web service coordination capability to users , 2004, WWW Alt. '04.

[61]  Wil M.P. van der Aalst,et al.  YAWL: yet another workflow language , 2005, Inf. Syst..

[62]  Cees T. A. M. de Laat,et al.  VLE-WFBus: A Scientific Workflow Bus for Multi e-Science Domains , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[63]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[64]  John Shalf,et al.  Enabling Applications on the Grid: A Gridlab Overview , 2003, Int. J. High Perform. Comput. Appl..

[65]  Steven Tuecke,et al.  The Anatomy of the Grid , 2003 .

[66]  Sandro Fiore,et al.  Advanced delivery mechanisms in the GRelC project , 2004, MGC '04.

[67]  Thierry Delaitre,et al.  High-level grid application environment to use legacy codes as OGSA grid services , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[68]  Antonio Brogi,et al.  From BPEL Processes to YAWL Workflows , 2006, WS-FM.

[69]  Gábor Terstyánszky,et al.  GEMLCA: Running Legacy Code Applications as Grid Services , 2005, Journal of Grid Computing.

[70]  Moustafa Ghanem,et al.  Grid-Enabled Workflows for Industrial Product Design , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[71]  Neil Geddes The National Grid Service of the UK , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[72]  Nuno Santos,et al.  Distributed Metadata with the AMGA Metadata Catalog , 2006, ArXiv.

[73]  Craig S. Mullins DB2 Developer's Guide , 1992 .

[74]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[75]  Rajkumar Buyya,et al.  A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.