Layering in Provenance Systems

Digital provenance describes the ancestry or history of a digital object. Most existing provenance systems, however, operate at only one level of abstraction: the system call layer, a workflow specification, or the high-level constructs of a particular application. The provenance collectable in each of these layers is different, and all of it can be important. Single-layer systems fail to account for the different levels of abstraction at which users need to reason about their data and processes. These systems cannot integrate data provenance across layers and cannot answer questions that require an integrated view of the provenance. We have designed a provenance collection structure facilitating the integration of provenance across multiple levels of abstraction, including a workflow engine, a web browser, and an initial runtime Python provenance tracking wrapper. We layer these components atop provenance-aware network storage (NFS) that builds upon a Provenance-Aware Storage System (PASS). We discuss the challenges of building systems that integrate provenance across multiple layers of abstraction, present how we augmented systems in each layer to integrate provenance, and present use cases that demonstrate how provenance spanning multiple layers provides functionality not available in existing systems. Our evaluation shows that the overheads imposed by layering provenance systems are reasonable.

[1]  Randy H. Katz,et al.  X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.

[2]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[3]  Margo I. Seltzer,et al.  Choosing a Data Model and Query Language for Provenance , 2008, IPAW 2008.

[4]  Erez Zadok,et al.  Selective Versioning in a Secure Disk System , 2008, USENIX Security Symposium.

[5]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[6]  Ian Foster,et al.  The First Provenance Challenge , 2008 .

[7]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[8]  Amin Vahdat,et al.  Transparent Result Caching , 1997, USENIX Annual Technical Conference.

[9]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[10]  Heng Yin,et al.  Panorama: capturing system-wide information flow for malware detection and analysis , 2007, CCS '07.

[11]  James Frew,et al.  Composing lineage metadata with XML for custom satellite-derived data products , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[12]  Eddie Kohler,et al.  Making information flow explicit in HiStar , 2006, OSDI '06.

[13]  Ian T. Foster,et al.  The virtual data grid: a new model and architecture for data-intensive collaboration , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[14]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[15]  Margo I. Seltzer,et al.  Securing Provenance , 2008, HotSec.

[16]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[17]  Alan L. Cox,et al.  Causeway: Operating System Support for Controlling and Analyzing the Execution of Distributed Programs , 2005, HotOS.

[18]  Eyal de Lara,et al.  The taser intrusion recovery system , 2005, SOSP '05.

[19]  Steve Vandebogart,et al.  Labels and event processes in the Asbestos operating system , 2005, TOCS.

[20]  Eddie Kohler,et al.  Information flow control for standard OS abstractions , 2007, SOSP.

[21]  Margo I. Seltzer,et al.  The Case for Browser Provenance , 2009, Workshop on the Theory and Practice of Provenance.

[22]  Allan Heydon,et al.  The Vesta Approach to Software Configuration Management , 2001 .

[23]  Kiran-Kumar Muniswamy-Reddy,et al.  Causality-based versioning , 2009, TOS.

[24]  Michael Austin Halcrow eCryptfs: An Enterprise-class Encrypted Filesystem for Linux , 2010 .

[25]  Marianne Winslett,et al.  The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance , 2009, FAST.