Object reuse and exchange for publishing and sharing workflows

The workflow paradigm can provide the means to describe the complete functional pipeline for a scientific experiment and therefore expose the underlying scientific processes for enabling the reproducibility of results. However, current means for exposing such information are tied closely to the individual workflow engines and there is no existing method that provides a common way to share this information. In this paper, we discuss a lightweight approach that can be used to expose such information, using the Open Archives Initiative Object Reuse and Exchange (ORE) standard, to provide a common format for representing and sharing workflows and their associated metadata required for their execution. We describe how workflows can be mapped to the ORE format using RDF and how they can be stored using bundles for sharing with others. We discuss tooling we have developed that provides a mechanism for existing workflow engines to conveniently export workflows as ORE bundles. We present three use cases for Triana, ASKALON and MOTEUR, where such integration has already been undertaken, and conclude the paper by providing a short study showing that the overhead implications of adopting the proposed ORE bundling format are minimal.

[1]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[2]  Ian J. Taylor,et al.  Web enabling desktop workflow applications , 2009, WORKS '09.

[3]  Johan Montagnat,et al.  Flexible and Efficient Workflow Deployment of Data-Intensive Applications On Grids With MOTEUR , 2008, Int. J. High Perform. Comput. Appl..

[4]  The International Journal of High Performance Computing Applications— , 1998 .

[5]  Paul T. Groth,et al.  Pipeline-centric provenance model , 2009, WORKS '09.

[6]  Informatika Open Archives Initiative Object Reuse and Exchange , 2010 .

[7]  Ian J. Taylor,et al.  Visual Grid Workflow in Triana , 2005, Journal of Grid Computing.

[8]  Matthew Shields,et al.  WS-RF Workflow in Triana , 2008, Int. J. High Perform. Comput. Appl..

[9]  Péter Kacsuk,et al.  P‐GRADE portal family for grid infrastructures , 2011, Concurr. Comput. Pract. Exp..

[10]  Péter Kacsuk,et al.  Brokering Multi-grid Workflows in the P-GRADE Portal , 2006, Euro-Par Workshops.

[11]  Carole A. Goble,et al.  Why Linked Data is Not Enough for Scientists , 2010, 2010 IEEE Sixth International Conference on e-Science.

[12]  Carole A. Goble,et al.  Designing the myExperiment Virtual Research Environment for the Social Sharing of Workflows , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[13]  Yogesh L. Simmhan,et al.  The Trident Scientific Workflow Workbench , 2008, 2008 IEEE Fourth International Conference on eScience.

[14]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[15]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[16]  Radu Prodan,et al.  Scheduling of scientific workflows in the ASKALON grid environment , 2005, SGMD.

[17]  David Lee,et al.  The Telescience Portal for advanced tomography applications , 2003, J. Parallel Distributed Comput..

[18]  Nigel Shadbolt,et al.  Resource Description Framework (RDF) , 2009 .