Addressing the provenance challenge using ZOOM

ZOOM* UserViews presents a model of provenance for scientific workflows that is simple, generic, and yet sufficiently expressive to answer questions of data and step provenance that have been encountered in a large variety of scientific case studies. In addition, ZOOM builds on the concept of composite step‐classes—or sub‐workflows—which is present in many scientific workflow systems to develop a notion of user views. This paper discusses the design and implementation of ZOOM in the context of the queries posed by the provenance challenge, and shows how user views affect the level of granularity at which provenance information can be seen and reasoned about. Copyright © 2007 John Wiley & Sons, Ltd.

[1]  Carole A. Goble,et al.  Using Semantic Web Technologies for Representing E-science Provenance , 2004, SEMWEB.

[2]  Yolanda Gil,et al.  Provenance trails in the Wings/Pegasus system , 2008, Concurr. Comput. Pract. Exp..

[3]  James Frew,et al.  Automatic capture and reconstruction of computational provenance , 2008, Concurr. Comput. Pract. Exp..

[4]  Zdenek Salvet,et al.  gLite Job Provenance—a job‐centric view , 2008, Concurr. Comput. Pract. Exp..

[5]  Roger S. Barga,et al.  Automatic capture and efficient storage of e‐Science experiment provenance , 2008, Concurr. Comput. Pract. Exp..

[6]  Bertram Ludäscher,et al.  From computation models to models of provenance: the RWS approach , 2008, Concurr. Comput. Pract. Exp..

[7]  Emmanuel Barillot,et al.  Selecting biomedical data sources according to user preferences , 2004, ISMB/ECCB.

[8]  Susan B. Davidson,et al.  Towards a Model of Provenance and User Views in Scientific Workflows , 2006, DILS.

[9]  Ian Foster,et al.  The First Provenance Challenge , 2008 .

[10]  Bertram Ludäscher,et al.  CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE Concurrency Computat , 2008 .

[11]  Ian T. Foster,et al.  The virtual data grid: a new model and architecture for data-intensive collaboration , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[12]  Carole A. Goble,et al.  Mining Taverna's semantic web of provenance , 2008, Concurr. Comput. Pract. Exp..

[13]  Bertram Ludäscher,et al.  A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows , 2006, IPAW.

[14]  Roger Barga,et al.  Automatic capture and efficient storage of e-Science experiment provenance , 2008 .

[15]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[16]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[17]  Yong Zhao,et al.  Tracking provenance in a virtual data grid , 2008, Concurr. Comput. Pract. Exp..

[18]  Shawn Bowers,et al.  An approach for pipelining nested collections in scientific workflows , 2005, SGMD.

[19]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[20]  Yogesh L. Simmhan,et al.  Query capabilities of the Karma provenance framework , 2008, Concurr. Comput. Pract. Exp..

[21]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[22]  Wang Chiew Tan,et al.  DBNotes: a post-it system for relational databases based on provenance , 2005, SIGMOD '05.