Towards Case-Based Support for e-Science Workflow Generation by Mining Provenance

e-Science brings large-scale computation to bear on scientific problems, often by performing sequences of computational tasks organized into workflows and executed on distributed Web resources. Sophisticated AI tools have been developed to apply knowledge-rich methods to compose scientific workflows by generative planning, but the required knowledge can be difficult to acquire. Current work by the cyberinfrastructure community aims to routinely capture provenance during workflow execution, which would provide a new experience-based knowledge source for workflow generation: large-scale databases of workflow execution traces. This paper proposes exploiting these databases with a "knowledge light" approach to reuse, applying CBR methods to those traces to support scientists' workflow generation process. This paper introduces e-Science workflows as a CBR domain, sketches key technical issues, and illustrates directions towards addressing these issues through ongoing research on Phala, a system which supports workflow generation by aiding re-use of portions of prior workflows. The paper uses workflow data collected by the myGrid and myExperiment projects in experiments which suggest that Phala's methods have promise for assisting workflow composition in the context of scientific experimentation.

[1]  Luc Lamontagne,et al.  Case-Based Reasoning Research and Development , 1997, Lecture Notes in Computer Science.

[2]  Barry Smyth,et al.  Advances in Case-Based Reasoning , 1996, Lecture Notes in Computer Science.

[3]  Dennis Gannon,et al.  A dynamic scientific workflow system for the web services architecture , 2007 .

[4]  Mark S. Boddy,et al.  An Analysis of Time-Dependent Planning , 1988, AAAI.

[5]  Pierre-Antoine Champin,et al.  Measuring the Similarity of Labeled Graphs , 2003, ICCBR.

[6]  J. Leon Zhao,et al.  A case-based reasoning framework for workflow model management , 2004, Data Knowl. Eng..

[7]  Carole A. Goble,et al.  Workflow discovery: the problem, a case study from e-Science and a graph-based solution , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[8]  Yogesh L. Simmhan,et al.  Karma2: Provenance Management for Data-Driven Workflows , 2008, Int. J. Web Serv. Res..

[9]  David B. Leake,et al.  Using Case Provenance to Propagate Feedback to Cases and Adaptations , 2008, ECCBR.

[10]  Yogesh L. Simmhan,et al.  A survey of data provenance techniques , 2005 .

[11]  Gregory R. Madey,et al.  Improving the Reuse of Scientific Workflows and Their By-products , 2007 .

[12]  Sara J. Graves,et al.  LINKED ENVIRONMENTS FOR ATMOSPHERIC DISCOVERY (LEAD): A CYBERINFRASTRUCTURE FOR MESOSCALE METEOROLOGY RESEARCH AND EDUCATION , 2004 .

[13]  Srinivasan Anandan,et al.  Similarity Metrics Applied to Graph Based Design Model Authoring , 2006 .

[14]  Heeseok Lee,et al.  Document-based workflow modeling: a case-based reasoning approach , 2002, Expert Syst. Appl..

[15]  Carole A. Goble,et al.  Recycling workflows and services through discovery and reuse , 2007, Concurr. Comput. Pract. Exp..

[16]  Alexander Tartakovski,et al.  Agile Workflow Technology and Case-Based Change Reuse for Long-Term Processes , 2008, Int. J. Intell. Inf. Technol..

[17]  David W. Aha,et al.  Generalizing from Case studies: A Case Study , 1992, ML.

[18]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[19]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[20]  Ruth Breu,et al.  CBRFlow: Enabling Adaptive Workflow Management Through Conversational Case-Based Reasoning , 2004, ECCBR.

[21]  Yolanda Gil,et al.  Artificial intelligence and grids: workflow planning and beyond , 2004, IEEE Intelligent Systems.

[22]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[23]  Marc Spraragen,et al.  An intelligent assistant for interactive workflow composition , 2004, IUI '04.

[24]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[25]  Ralph Bergmann,et al.  Structural Adaptation of Workflows Supported by a Suspension Mechanism stand by Case-Based Reasoning , 2007, 16th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE 2007).

[26]  David Charles De Roure,et al.  myExperiment: social networking for workflow-using e-scientists , 2007, WORKS '07.

[27]  Mark Greenwood,et al.  Taverna: lessons in creating a workflow environment for the life sciences: Research Articles , 2006 .

[28]  Luc Moreau,et al.  Recycling workflows and services through discovery and reuse: Research Articles , 2007 .

[29]  David B. Leake,et al.  Case Provenance: The Value of Remembering Case Sources , 2007, ICCBR.

[30]  Carole A. Goble,et al.  myGrid: personalised bioinformatics on the information grid , 2003, ISMB.