Getting It Together: Enabling Multi-organization Provenance Exchange

We present an architecture that supports provenance queries in large, dynamic, multi-organizational environments. The Provenance Challenges have explored exchange across disparate provenance systems, yet this is only a first step. We describe requirements for multi-organizational provenance, evaluate candidate architectures, describe the approach implemented in the PLUS prototype provenance manager, and present performance results that indicate the approach is scalable.

[1]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[2]  Paul T. Groth,et al.  An Architecture for Provenance Systems , 2006 .

[3]  Cláudio T. Silva,et al.  Querying and re-using workflows with VsTrails , 2008, SIGMOD Conference.

[4]  Arnon Rosenthal,et al.  PLUS: Synthesizing privacy, lineage, uncertainty and security , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[5]  Yogesh L. Simmhan,et al.  Special Issue: The First Provenance Challenge , 2008, Concurr. Comput. Pract. Exp..

[6]  Sanjeev Khanna,et al.  Optimizing user views for workflows , 2009, ICDT '09.

[7]  Paul T. Groth,et al.  PReServ: Provenance Recording for Services , 2005 .

[8]  Mark Greenwood,et al.  Taverna: lessons in creating a workflow environment for the life sciences: Research Articles , 2006 .

[9]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[10]  James Frew,et al.  Automatic capture and reconstruction of computational provenance , 2008 .

[11]  Yogesh L. Simmhan,et al.  Semantically Annotated Provenance in the Life Science Grid , 2009, SWPM.

[12]  Frederick Reiss,et al.  Refining Information Extraction Rules using Data Provenance , 2010, IEEE Data Eng. Bull..

[13]  Carole A. Goble,et al.  CaGrid Workflow Toolkit: A taverna based workflow tool for cancer grid , 2010, BMC Bioinformatics.

[14]  Margo Seltzer,et al.  PASSing the provenance challenge , 2008 .

[15]  Jennifer Widom,et al.  Panda: A System for Provenance and Data , 2010, IEEE Data Eng. Bull..

[16]  Marta Mattoso,et al.  Integrating Provenance Data from Distributed Workflow Systems with ProvManager , 2010, IPAW.

[17]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..