Capturing Provenance in the Wild

All current provenance systems are “closed world” systems; provenance is collected within the confines of a well understood, pre-planned system. However, when users compose services from heterogeneous systems and organizations to form a new application, it is impossible to track the provenance in the new system using currently available work. In this work, we describe the ability to compose multiple provenance-unaware services in an “open world” system and still collect provenance information about their execution. Our approach is implemented using the PLUS provenance system and the open source MULE Enterprise Service Bus. Our evaluations show that this approach is scalable and has minimal overhead.

[1]  Paul T. Groth,et al.  PReServ: Provenance Recording for Services , 2005 .

[2]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[3]  James Frew,et al.  Automatic capture and reconstruction of computational provenance , 2008, Concurr. Comput. Pract. Exp..

[4]  Simon Miles Automatically Adapting Source Code to Document Provenance , 2010, IPAW.

[5]  Simon Miles Electronically Querying for the Provenance of Entities , 2006, IPAW.

[6]  Paul T. Groth,et al.  A model of process documentation to determine provenance in mash-ups , 2009, TOIT.

[7]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[8]  Ian Foster,et al.  Special Issue: The First Provenance Challenge , 2008 .

[9]  James Frew,et al.  Automatic capture and reconstruction of computational provenance , 2008 .

[10]  Cláudio T. Silva,et al.  Querying and re-using workflows with VsTrails , 2008, SIGMOD Conference.

[11]  Carole A. Goble,et al.  Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements , 2008, IPAW.

[12]  Yogesh L. Simmhan,et al.  Karma2: Provenance Management for Data-Driven Workflows , 2008, Int. J. Web Serv. Res..

[13]  Yogesh L. Simmhan,et al.  Special Issue: The First Provenance Challenge , 2008, Concurr. Comput. Pract. Exp..

[14]  Arnon Rosenthal,et al.  PLUS: Synthesizing privacy, lineage, uncertainty and security , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[15]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.