PLUS: A provenance manager for integrated information

It can be difficult to fully understand the result of integrating information from diverse sources. When all the information comes from a single organization, there is a collective knowledge about where it came from and whether it can be trusted. Unfortunately, once information from multiple organizations is integrated, there is no longer a shared knowledge of the data and its quality. It is often impossible to view and judge the information from a different organization; when errors occur, notification does not always reach all users of the data. We describe how a multi-organizational provenance store that collects provenance from heterogeneous systems addresses these problems. Unlike most provenance systems, we cope with an open world, where the data usage is not determined in advance and can take place across many systems and organizations.

[1]  Paul T. Groth,et al.  PReServ: Provenance Recording for Services , 2005 .

[2]  Carole A. Goble,et al.  Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements , 2008, IPAW.

[3]  Alun D. Preece,et al.  Managing information quality in e-science: the qurator workbench , 2007, SIGMOD '07.

[4]  Susan B. Davidson,et al.  Addressing the provenance challenge using ZOOM , 2008, Concurr. Comput. Pract. Exp..

[5]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[6]  Arnon Rosenthal,et al.  PLUS: Synthesizing privacy, lineage, uncertainty and security , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[7]  Jing Zhang,et al.  Do You Know Where Your Data's Been? - Tamper-Evident Database Provenance , 2009, Secure Data Management.

[8]  Marianne Winslett,et al.  The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance , 2009, FAST.

[9]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[10]  Adriane Chapman,et al.  Provenance Capture and Use: A Practical Guide , 2010 .

[11]  Genshe Chen,et al.  Pedigree Information for Enhanced Situation and Threat Assessment , 2006, 2006 9th International Conference on Information Fusion.

[12]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[13]  Yogesh L. Simmhan,et al.  Karma2: Provenance Management for Data-Driven Workflows , 2008, Int. J. Web Serv. Res..

[14]  Adriane Chapman,et al.  Surrogate Parenthood: Protected and Informative Graphs , 2011, Proc. VLDB Endow..

[15]  Adriane Chapman,et al.  Scalable Access Controls for Lineage , 2009, Workshop on the Theory and Practice of Provenance.

[16]  James Frew,et al.  Automatic capture and reconstruction of computational provenance , 2008, Concurr. Comput. Pract. Exp..

[17]  John P. Stenbit Department of Defense Net-Centric Data Strategy , 2003 .

[18]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[19]  Cláudio T. Silva,et al.  Querying and re-using workflows with VsTrails , 2008, SIGMOD Conference.

[20]  Adriane Chapman,et al.  Capturing Provenance in the Wild , 2010, IPAW.

[21]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[22]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[23]  Val Tannen,et al.  Annotated XML: queries and provenance , 2008, PODS.