Incremental Workflow Improvement Through Analysis of Its Data Provenance

Repeated executions of resource-intensive workflows over a large number of runs are commonly observed in e-science practice. We explore the hypothesis that, in some cases, provenance traces recorded for past runs of a workflow can be used to make future runs more efficient. This investigation is an initial step into the systematic study of the role that provenance analysis can play in the broader context of selfmanaging software systems. We have tested our hypothesis on a concrete case study involving a Chemical Engineering workflow deployed on a cloud infrastructure, where we can measure the cost of its repeated execution. Our approach involves augmenting the workflow with a feedback loop in which incremental analysis of the provenance of past runs is used to control some of the workflow steps in subsequent executions. We present initial experimental results and hint at future improvements as part of ongoing work. © 2011 Newcastle University. Printed and published by Newcastle University, Computing Science, Claremont Tower, Claremont Road, Newcastle upon Tyne, NE1 7RU, England. Bibliographical details

[1]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[2]  Jill P Mesirov,et al.  Accessible Reproducible Research , 2010, Science.

[3]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[4]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[5]  Umut A. Acar Self-adjusting computation: (an overview) , 2009, PEPM '09.

[6]  Julie A. McCann,et al.  A survey of autonomic computing—degrees, models, and applications , 2008, CSUR.

[7]  Paul Watson,et al.  e-Science Central: Cloud-based e-Science and its application to chemical property modelling , 2010 .

[8]  Norman W. Paton,et al.  Fine-grained and efficient lineage querying of collection-based workflow provenance , 2010, EDBT '10.

[9]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[10]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[11]  David Sinreich,et al.  An architectural blueprint for autonomic computing , 2006 .

[12]  Paul Watson,et al.  e‐Science Central for CARMEN: science as a service , 2010, Concurr. Comput. Pract. Exp..

[13]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[14]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..