Provenance as first class cloud data

Digital provenance is meta-data that describes the ancestry or history of a digital object. Most work on provenance focuses on how provenance increases the value of data to consumers. However, provenance is also valuable to storage providers. For example, provenance can provide hints on access patterns, detect anomalous behavior, and provide enhanced user search capabilities. As the next generation storage providers, cloud vendors are in the unique position to capitalize on this opportunity to incorporate provenance as a fundamental storage system primitive. To date, cloud offerings have not yet done so. We provide motivation for providers to treat provenance as first class data in the cloud and based on our experience with provenance in a local storage system, suggest a set of requirements that make provenance feasible and attractive.

[1]  Jim Griffioen,et al.  Reducing File System Latency using a Predictive Approach , 1994, USENIX Summer.

[2]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[3]  Thomas M. Kroeger,et al.  Predicting file system actions from prior events , 1996 .

[4]  Marianne Winslett,et al.  The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance , 2009, FAST.

[5]  Margo I. Seltzer,et al.  Choosing a Data Model and Query Language for Provenance , 2008, IPAW 2008.

[6]  Margo I. Seltzer,et al.  Layering in Provenance Systems , 2009, USENIX Annual Technical Conference.

[7]  Margo I. Seltzer,et al.  Securing Provenance , 2008, HotSec.

[8]  Brian D. Noble,et al.  Using Provenance to Aid in Personal File Search , 2007, USENIX Annual Technical Conference.

[9]  Samuel T. King,et al.  Enriching Intrusion Alerts Through Multi-Host Causality , 2005, NDSS.

[10]  Shankar Pasupathy,et al.  Maximizing Efficiency by Trading Storage for Computation , 2009, HotCloud.

[11]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[12]  Margo I. Seltzer,et al.  Making a Cloud Provenance-Aware , 2009, Workshop on the Theory and Practice of Provenance.

[13]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[14]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[15]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[16]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[17]  Stephanie Forrest,et al.  Automated response using system-call delays , 2000 .

[18]  Yogesh L. Simmhan,et al.  The Open Provenance Model (v1.01) , 2008 .