The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance

As increasing amounts of valuable information are produced and persist digitally, the ability to determine the origin of data becomes important. In science, medicine, commerce, and government, data provenance tracking is essential for rights protection, regulatory compliance, management of intelligence and medical data, and authentication of information as it flows through workplace tasks. In this paper, we show how to provide strong integrity and confidentiality assurances for data provenance information. We describe our provenance-aware system prototype that implements provenance tracking of data writes at the application layer, which makes it extremely easy to deploy. We present empirical results that show that, for typical real-life workloads, the run-time overhead of our approach to recording provenance with confidentiality and integrity guarantees ranges from 1%-13%.

[1]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[2]  Rosario Gennaro,et al.  How to Sign Digital Streams , 1997, Inf. Comput..

[3]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[4]  Roger S. Barga,et al.  Automatic Generation of Workflow Provenance , 2006, IPAW.

[5]  Ben Collins-Sussman,et al.  The subversion project: buiding a better CVS , 2002 .

[6]  Marianne Winslett,et al.  Introducing secure provenance: problems and challenges , 2007, StorageSS '07.

[7]  Luc Moreau,et al.  Provenance-Based Auditing of Private Data Use , 2008, BCS Int. Acad. Conf..

[8]  William Pugh,et al.  Skip Lists: A Probabilistic Alternative to Balanced Trees , 1989, WADS.

[9]  Paul T. Groth,et al.  Security Issues in a SOA-Based Provenance System , 2006, IPAW.

[10]  Mary Baker,et al.  Secure History Preservation Through Timeline Entanglement , 2002, USENIX Security Symposium.

[11]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[12]  Elisa Bertino,et al.  Controlled and cooperative updates of XML documents in byzantine and failure-prone distributed systems , 2006, TSEC.

[13]  Adi Shamir,et al.  The LSD Broadcast Encryption Scheme , 2002, CRYPTO.

[14]  James Frew,et al.  Earth System Science Workbench: a data management infrastructure for earth science products , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[15]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[16]  Brian Berliner,et al.  CVS II: Parallelizing Software Dev elopment , 1998 .

[17]  Giuseppe Ateniese,et al.  Verifiable audit trails for a versioning file system , 2005, StorageSS '05.

[18]  Jessica Staddon,et al.  Graph-based authentication of digital streams , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[19]  Avishai Wool,et al.  A practical revocation scheme for broadcast encryption using smartcards , 2006, TSEC.

[20]  Dan S. Wallach,et al.  Casting Votes in the Auditorium , 2007, EVT.

[21]  C M Faddick Health care fraud and abuse: new weapons, new penalties, and new fears for providers created by the Health Insurance Portability and Accountability Act of 1996 ("HIPAA"). , 1997, Annals of health law.

[22]  Paul T. Groth,et al.  The provenance of electronic data , 2008, CACM.

[23]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[24]  Ran Canetti,et al.  Efficient authentication and signing of multicast streams over lossy channels , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[25]  Form 10-Q SECURITIES AND EXCHANGE COMMISSION , 1985 .

[26]  Gregor von Laszewski,et al.  A Collaborative Informatics Infrastructure for Multi-Scale Science , 2004, Proceedings of the Second International Workshop on Challenges of Large Applications in Distributed Environments, 2004. CLADE 2004..

[27]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[28]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[29]  Margo I. Seltzer,et al.  Securing Provenance , 2008, HotSec.

[30]  Beth Plale,et al.  Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering , 2006, IPAW.

[31]  Justin M. Reyneri,et al.  Coin flipping by telephone , 1984, IEEE Trans. Inf. Theory.

[32]  Oded Goldreich,et al.  Foundations of Cryptography: Basic Tools , 2000 .

[33]  Luc Moreau,et al.  Provenance and Annotation of Data, International Provenance and Annotation Workshop, IPAW 2006, Chicago, IL, USA, May 3-5, 2006, Revised Selected Papers , 2006, IPAW.

[34]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[35]  Philippe Golle,et al.  Authenticating Streamed Data in the Presence of Random Packet Loss , 2001, NDSS.

[36]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[37]  Dennis Shasha,et al.  Secure Untrusted Data Repository (SUNDR) , 2004, OSDI.

[38]  OpenSSL OpenSSL : The open source toolkit for SSL/TSL , 2002 .

[39]  Margo I. Seltzer,et al.  Passive NFS Tracing of Email and Research Workloads , 2003, FAST.

[40]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[41]  Shankar Pasupathy,et al.  Measurement and Analysis of Large-Scale Network File System Workloads , 2008, USENIX Annual Technical Conference.

[42]  Carole A. Goble,et al.  Semantically Linking and Browsing Provenance Logs for E-science , 2004, ICSNW.

[43]  Jennifer Golbeck,et al.  Combining Provenance with Trust in Social Networks for Semantic Web Content Filtering , 2006, IPAW.

[44]  Luc Moreau,et al.  Recording and Reasoning over Data Provenance in Web and Grid Services , 2003, OTM.

[45]  William J. Bolosky,et al.  A large-scale study of file-system contents , 1999, SIGMETRICS '99.

[46]  Margo I. Seltzer,et al.  Issues in Automatic Provenance Collection , 2006, IPAW.

[47]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[48]  Christian S. Collberg,et al.  Tamper Detection in Audit Logs , 2004, VLDB.

[49]  Bruce Schneier,et al.  Secure audit logs to support computer forensics , 1999, TSEC.

[50]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[51]  James Cheney,et al.  A Provenance Model for Manually Curated Data , 2006, IPAW.

[52]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[53]  Clifford A. Lynch,et al.  When documents deceive: Trust and provenance as new factors for information retrieval in a tangled web , 2001, J. Assoc. Inf. Sci. Technol..

[54]  Sara McMains,et al.  File System Logging versus Clustering: A Performance Comparison , 1995, USENIX.