Lifecycle of Scientific Workflows and their Provenance: A Usage Perspective

Scientific workflows are representations of generally one, but sometimes more, process(es) in the scientific method. They combine data and computational procedures into a configurable, structured set of steps that implement semi-automated computational solutions of a scientific problem. Each atomic step in a scientific workflow uses a technology to carry out the computation or data processing. Thus, as technology progresses the requirements and motivations for usage of scientific workflows evolve with it. The first part of this invited talk aims to explore this evolution of how scientific workflows are used from late 1990s to date and discusses the advantages gained from this usage. In the second half we delve into current and expected advantages of scientific workflow systems as they mature from art to commodity, with a focus on provenance of scientific workflows and related products.As has been proven by recent workshops, challenges and community interest, capturing provenance information for computational experiments and simulations is a significant advantage of using scientific workflows to conduct computational studies. Many scientific workflow systems today provide provenance recording functionality. However, lack of generic tools to support usage of the collected information limits the usage of the provenance functionality by different users. This talk concludes by presenting a vision for creating a provenance framework to support a series of steps in the lifecycle of provenance information starting from data collection, which could serve multiple data and computational models, to the usages of this data.