Scientific Process Automation and Workflow Management

We introduce and describe scientific workflows, i.e., executable descriptions of automatable scientific processes such as computational science simulations and data analyses. Scientific workflows are often expressed in terms of tasks and their (data ow) dependencies. This chapter first provides an overview of the characteristic features of scientific workflows and outlines their life cycle. A detailed case study highlights workflow challenges and solutions in simulation management. We then provide a brief overview of how some concrete systems support the various phases of the workflow life cycle, i.e., design, resource management, execution, and provenance management. We conclude with a discussion on community-based workflow sharing.

[1]  Scott Klasky,et al.  Plasma Edge Kinetic-MHD Modeling in Tokamaks Using Kepler Workflow for Code Coupling, Data Management and Visualization , 2008 .

[2]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[3]  Richard Wolski,et al.  An Analysis of Availability Distributions in Condor , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[4]  David Abramson,et al.  Nimrod: a tool for performing parametrised simulations using distributed workstations , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[5]  Francisco Curbera,et al.  Web Services Business Process Execution Language Version 2.0 , 2007 .

[6]  Carole A. Goble,et al.  Feta: A Light-Weight Architecture for User Oriented Semantic Service Discovery , 2005, ESWC.

[7]  Ian Taylor,et al.  Triana Generations , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[8]  Daniel S. Katz,et al.  Workflow task clustering for best effort systems with Pegasus , 2008, Mardi Gras Conference.

[9]  Ian J. Taylor,et al.  Triana Applications within Grid Computing and Peer to Peer Environments , 2003, Journal of Grid Computing.

[10]  Yolanda Gil,et al.  Pegasus: Mapping Scientific Workflows onto the Grid , 2004, European Across Grids Conference.

[11]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[12]  Carole A. Goble,et al.  Automatic annotation of Web services based on workflow definitions , 2006, TWEB.

[13]  Daniel S. Katz,et al.  Accessing and visualizing scientific spatiotemporal data , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[14]  Michael Gertz,et al.  Real-Time Integration of Geospatial Raster and Point Data Streams , 2008, SSDBM.

[15]  Carole A. Goble,et al.  Data curation + process curation=data integration + science , 2008, Briefings Bioinform..

[16]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[17]  Catriel Beeri,et al.  Querying business processes , 2006, VLDB.

[18]  Mathias Weske,et al.  Scientific Workflows: Business as Usual? , 2009, BPM.

[19]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[20]  Adam Arbree,et al.  Mapping Abstract Complex Workflows onto Grid Environments , 2003, Journal of Grid Computing.

[21]  Cláudio T. Silva,et al.  Managing Rapidly-Evolving Scientific Workflows , 2006, IPAW.

[22]  Bertram Ludäscher,et al.  Actor-Oriented Design of Scientific Workflows , 2005, ER.

[23]  Ewa Deelman,et al.  Enabling parallel scientific applications with workflow tools , 2006, 2006 IEEE Challenges of Large Applications in Distributed Environments.

[24]  Arie Shoshani,et al.  Data management on the fusion computational pipeline , 2005 .

[25]  Carole A. Goble,et al.  Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements , 2008, IPAW.

[26]  Daniel Crawl,et al.  A Provenance-Based Fault Tolerance Mechanism for Scientific Workflows , 2008, IPAW.

[27]  Edward A. Lee,et al.  The Semantics of Dataflow with Firing , 2022 .

[28]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[29]  Carole A. Goble,et al.  myGrid: personalised bioinformatics on the information grid , 2003, ISMB.

[30]  Bertram Ludäscher,et al.  X-CSR: Dataflow Optimization for Distributed XML Process Pipelines , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[31]  Mladen A. Vouk,et al.  SDM center technologies for accelerating scientific discoveries , 2007 .

[32]  Yolanda Gil,et al.  Provenance trails in the Wings-Pegasus system , 2008 .

[33]  Cláudio T. Silva,et al.  Tackling the Provenance Challenge one layer at a time , 2008 .

[34]  Jing Tao,et al.  Incorporating Semantics in Scientific Workflow Authoring , 2005, SSDBM.

[35]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[36]  Bertram Ludäscher,et al.  A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows , 2006, IPAW.

[37]  Bertram Ludäscher,et al.  Collection-Oriented Scientific Workflows for Integrating and Analyzing Biological Data , 2006, DILS.

[38]  Bertram Ludäscher,et al.  Scientific workflow design for mere mortals , 2009, Future Gener. Comput. Syst..

[39]  Robert Stevens,et al.  Treating Shimantic Web Syndrome with Ontologies , 2004 .

[40]  Mark F. Adams,et al.  Gyrokinetic particle simulation of neoclassical transport in the pedestal/scrape-off region of a tokamak plasma , 2006 .

[41]  Scott Klasky,et al.  Workflow automation for processing plasma fusion simulation data , 2007, WORKS '07.

[42]  James R. Rice,et al.  From Scientific Software Libraries to Problem Solving Environments John R. Rice , 1996 .

[43]  Yolanda Gil,et al.  Wings for Pegasus: Creating Large-Scale Scientific Applications Using Semantic Representations of Computational Workflows , 2007, AAAI.

[44]  Carole A. Goble,et al.  Software Design for Empowering Scientists , 2009, IEEE Software.

[45]  David Abramson,et al.  GriddLeS Enhancements and Building Virtual Applications for the GRID with Legacy Components , 2005, EGC.

[46]  Robert Stevens,et al.  A systematic strategy for large-scale analysis of genotype–phenotype correlations: identification of candidate genes involved in African trypanosomiasis , 2007, Nucleic acids research.

[47]  Ewa Deelman,et al.  Integrating existing scientific workflow systems: the Kepler/Pegasus example , 2007, WORKS '07.

[48]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[49]  Jacques Wainer,et al.  Scientific Workflow Systems , 1996 .

[50]  Ian Foster,et al.  The First Provenance Challenge , 2008 .

[51]  Ann L. Chervenak,et al.  Data Management Challenges of Data-Intensive Scientific Workflows , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[52]  Edward A. Lee,et al.  Composing Different Models of Computation in Kepler and Ptolemy II , 2007, International Conference on Computational Science.

[53]  Mathias Weske,et al.  Business Process Management: Concepts, Languages, Architectures , 2007 .

[54]  Carole A. Goble,et al.  Guest editors' introduction to the special section on scientific workflows , 2005, SGMD.

[55]  Bertram Ludäscher,et al.  Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life , 2008, IPAW.

[56]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[57]  Bertram Ludäscher,et al.  Project Histories: Managing Data Provenance Across Collection-Oriented Scientific Workflow Runs , 2007, DILS.

[58]  Bertram Ludäscher,et al.  From computation models to models of provenance: the RWS approach , 2008 .

[59]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[60]  Edward A. Lee,et al.  Dataflow process networks , 1995, Proc. IEEE.

[61]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[62]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[63]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[64]  Yong Wang,et al.  A new grid workflow description language , 2005, 2005 IEEE International Conference on Services Computing (SCC'05) Vol-1.

[65]  Cláudio T. Silva,et al.  Querying and Creating Visualizations by Analogy , 2007, IEEE Transactions on Visualization and Computer Graphics.

[66]  Carole A. Goble,et al.  The design and realisation of the myExperiment Virtual Research Environment for social sharing of workflows , 2009, Future Gener. Comput. Syst..

[67]  Bertram Ludäscher,et al.  Provenance in Scientific Workflow Systems , 2007, IEEE Data Eng. Bull..

[68]  Zhiming Zhao,et al.  Scientific Workflows , 2006, Sci. Program..

[69]  Marc Spraragen,et al.  Simplifying construction of complex workflows for non-expert users of the Southern California Earthquake Center Community Modeling Environment , 2005, SGMD.