Provenance-Aware Storage Systems

A Provenance-Aware Storage System (PASS) is a storage system that automatically collects and maintains provenance or lineage, the complete history or ancestry of an item. We discuss the advantages of treating provenance as meta-data collected and maintained by the storage system, rather than as manual annotations stored in a separately administered database. We describe a PASS implementation, discussing the challenges it presents, performance cost it incurs, and the new functionality it enables. We show that with reasonable overhead, we can provide useful functionality not available in today's file systems or provenance management systems.

[1]  C. Kesselman,et al.  A Metadata Catalog Service for Data Intensive Applications , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[2]  Erez Zadok,et al.  Extending File Systems Using Stackable Templates , 1999, USENIX Annual Technical Conference, General Track.

[3]  Samuel T. King,et al.  Backtracking intrusions , 2003, SOSP '03.

[4]  Allan Heydon,et al.  The Vesta Approach to Software Configuration Management , 2001 .

[5]  Yogesh L. Simmhan,et al.  A survey of data provenance techniques , 2005 .

[6]  B. Schlesinger,et al.  Definition of the Flexible Image Transport System (FITS) , 2001 .

[7]  Ian T. Foster,et al.  The virtual data grid: a new model and architecture for data-intensive collaboration , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[8]  Nikolai Joukov,et al.  Auto-pilot: A Platform for System Software Benchmarking , 2005, USENIX Annual Technical Conference, FREENIX Track.

[9]  Aditya Kashyap File System Extensibility and Reliability Using an in-Kernel Database , 2004 .

[10]  Tim Howes,et al.  Lightweight Directory Access Protocol , 1995, RFC.

[11]  Tim Howes,et al.  Lightweight Directory Access Protocol (v3) , 1997, RFC.

[12]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000, Softw. Pract. Exp..

[13]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[14]  Uri Braun,et al.  A Security Model for Provenance , 2006 .

[15]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[16]  Robert F. Sproull,et al.  Building an electronic records archive at the National Archives and Record Administration : recommendations for a long-term strategy , 2005 .

[17]  Sara McMains,et al.  File System Logging versus Clustering: A Performance Comparison , 1995, USENIX.

[18]  J. Charles,et al.  A Sino-German λ 6 cm polarization survey of the Galactic plane I . Survey strategy and results for the first survey region , 2006 .

[19]  William H. Green Collaborating for Multi-Scale Chemical Science , 2006 .

[20]  Paul J. Leach,et al.  A Common Internet File System (CIFS/1.0) Protocol , 1998 .

[21]  Carl Kesselman,et al.  Grid-based metadata services , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[22]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[23]  James Frew,et al.  Earth System Science Workbench: a data management infrastructure for earth science products , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[24]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000 .

[25]  Kaizar Amin,et al.  Metadata in the Collaboratory for Multi-Scale Chemical Science , 2003, Dublin Core Conference.

[26]  Andy Barnhart,et al.  The common Internet file system , 1997 .

[27]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[28]  Udi Manber,et al.  GLIMPSE: A Tool to Search Through Entire File Systems , 1994, USENIX Winter.

[29]  Michael Luck,et al.  Formalising a protocol for recording provenance in Grids , 2004 .

[30]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[31]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[32]  Kiran-Kumar Muniswamy-Reddy Deciding How to Store Provenance , 2006 .

[33]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.