Provenance for the Cloud

The cloud is poised to become the next computing environment for both data storage and computation due to its pay-as-you-go and provision-as-you-go models. Cloud storage is already being used to back up desktop user data, host shared scientific data, store web application data, and to serve web pages. Today's cloud stores, however, are missing an important ingredient: provenance. Provenance is metadata that describes the history of an object. We make the case that provenance is crucial for data stored on the cloud and identify the properties of provenance that enable its utility. We then examine current cloud offerings and design and implement three protocols for maintaining data/provenance in current cloud stores. The protocols represent different points in the design space and satisfy different subsets of the provenance properties. Our evaluation indicates that the overheads of all three protocols are comparable to each other and reasonable in absolute terms. Thus, one can select a protocol based upon the properties it provides without sacrificing performance. While it is feasible to provide provenance as a layer on top of today's cloud offerings, we conclude by presenting the case for incorporating provenance as a core cloud feature, discussing the issues in doing so.

[1]  Margo I. Seltzer,et al.  Securing Provenance , 2008, HotSec.

[2]  Eric Eide,et al.  An Experimentation Workbench for Replayable Networking Research , 2007, NSDI.

[3]  Allan Heydon,et al.  Software Configuration Management System Using Vesta (Monographs in Computer Science) , 2004 .

[4]  Ian T. Foster,et al.  The virtual data grid: a new model and architecture for data-intensive collaboration , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[5]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[6]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing - "ABSTRACT" , 2009, PODC '09.

[7]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[8]  Robert Stevens,et al.  Annotating, Linking and Browsing Provenance Logs for {e-Science} , 2003 .

[9]  Tim Kraska,et al.  Building a database on S3 , 2008, SIGMOD Conference.

[10]  Marianne Winslett,et al.  The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance , 2009, FAST.

[11]  Shankar Pasupathy,et al.  Maximizing Efficiency by Trading Storage for Computation , 2009, HotCloud.

[12]  Kiran-Kumar Muniswamy-Reddy,et al.  Causality-based versioning , 2009, TOS.

[13]  Jeff Dike,et al.  User-mode Linux , 2006, Annual Linux Showcase & Conference.

[14]  James Frew,et al.  Automatic capture and reconstruction of computational provenance , 2008 .

[15]  Peter T. Wood Graph Database , 2009, Encyclopedia of Database Systems.

[16]  Luc Moreau,et al.  Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures , 2008, IPAW.

[17]  Michael Luck,et al.  Formalising a protocol for recording provenance in Grids , 2004 .

[18]  Ari Juels,et al.  Pors: proofs of retrievability for large files , 2007, CCS '07.

[19]  Allan Heydon,et al.  Software Configuration Management Using Vesta , 2006, Monographs in Computer Science.

[20]  Margo I. Seltzer,et al.  Provenance as first class cloud data , 2010, OPSR.

[21]  Yogesh L. Simmhan,et al.  A Framework for Collecting Provenance in Data-Centric Scientific Workflows , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[22]  James Frew,et al.  Automatic capture and reconstruction of computational provenance , 2008, Concurr. Comput. Pract. Exp..

[23]  Margo I. Seltzer,et al.  Making a Cloud Provenance-Aware , 2009, Workshop on the Theory and Practice of Provenance.

[24]  Margo I. Seltzer,et al.  Layering in Provenance Systems , 2009, USENIX Annual Technical Conference.

[25]  Brian D. Noble,et al.  Using Provenance to Aid in Personal File Search , 2007, USENIX Annual Technical Conference.

[26]  Margo I. Seltzer,et al.  The Case for Browser Provenance , 2009, Workshop on the Theory and Practice of Provenance.

[27]  Reza Curtmola,et al.  Provable data possession at untrusted stores , 2007, CCS '07.

[28]  Ian Foster,et al.  The First Provenance Challenge , 2008 .

[29]  Peter Z. Kunszt,et al.  Data Mining the SDSS SkyServer Database , 2002, WDAS.