Preventing history forgery with secure provenance

As increasing amounts of valuable information are produced and persist digitally, the ability to determine the origin of data becomes important. In science, medicine, commerce, and government, data provenance tracking is essential for rights protection, regulatory compliance, management of intelligence and medical data, and authentication of information as it flows through workplace tasks. While significant research has been conducted in this area, the associated security and privacy issues have not been explored, leaving provenance information vulnerable to illicit alteration as it passes through untrusted environments. In this article, we show how to provide strong integrity and confidentiality assurances for data provenance information at the kernel, file system, or application layer. We describe Sprov, our provenance-aware system prototype that implements provenance tracking of data writes at the application layer, which makes Sprov extremely easy to deploy. We present empirical results that show that, for real-life workloads, the runtime overhead of Sprov for recording provenance with confidentiality and integrity guarantees ranges from 1% to 13%, when all file modifications are recorded, and from 12% to 16%, when all file read and modifications are tracked.

[1]  M. Villegas,et al.  Gramm–Leach–Bliley (GLB) Financial Services Modernization Act , 2001 .

[2]  Roger S. Barga,et al.  Automatic Generation of Workflow Provenance , 2006, IPAW.

[3]  Jennifer Golbeck,et al.  Combining Provenance with Trust in Social Networks for Semantic Web Content Filtering , 2006, IPAW.

[4]  Luc Moreau,et al.  Recording and Reasoning over Data Provenance in Web and Grid Services , 2003, OTM.

[5]  Elisa Bertino,et al.  Controlled and cooperative updates of XML documents in byzantine and failure-prone distributed systems , 2006, TSEC.

[6]  Bruce Schneier,et al.  Secure audit logs to support computer forensics , 1999, TSEC.

[7]  William J. Bolosky,et al.  A large-scale study of file-system contents , 1999, SIGMETRICS '99.

[8]  Luc Moreau,et al.  A Profile for Non-Repudiable Process Documentation , 2006 .

[9]  Gregor von Laszewski,et al.  A Collaborative Informatics Infrastructure for Multi-Scale Science , 2004, Proceedings of the Second International Workshop on Challenges of Large Applications in Distributed Environments, 2004. CLADE 2004..

[10]  Giuseppe Ateniese,et al.  Verifiable audit trails for a versioning file system , 2005, StorageSS '05.

[11]  Margo I. Seltzer,et al.  Issues in Automatic Provenance Collection , 2006, IPAW.

[12]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[13]  Margo I. Seltzer,et al.  Securing Provenance , 2008, HotSec.

[14]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[15]  Manuel Blum,et al.  Coin Flipping by Telephone. , 1981, CRYPTO 1981.

[16]  Adi Shamir,et al.  The LSD Broadcast Encryption Scheme , 2002, CRYPTO.

[17]  James Cheney,et al.  A Provenance Model for Manually Curated Data , 2006, IPAW.

[18]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[19]  Dan S. Wallach,et al.  Casting Votes in the Auditorium , 2007, EVT.

[20]  Paul T. Groth,et al.  The provenance of electronic data , 2008, CACM.

[21]  Clifford A. Lynch,et al.  When documents deceive: Trust and provenance as new factors for information retrieval in a tangled web , 2001, J. Assoc. Inf. Sci. Technol..

[22]  Sara McMains,et al.  File System Logging versus Clustering: A Performance Comparison , 1995, USENIX.

[23]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[24]  Margo I. Seltzer,et al.  Layering in Provenance Systems , 2009, USENIX Annual Technical Conference.

[25]  Rosario Gennaro,et al.  How to Sign Digital Streams , 1997, CRYPTO.

[26]  Paul T. Groth,et al.  Security Issues in a SOA-Based Provenance System , 2006, IPAW.

[27]  Mary Baker,et al.  Secure History Preservation Through Timeline Entanglement , 2002, USENIX Security Symposium.

[28]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[29]  Jessica Staddon,et al.  Graph-based authentication of digital streams , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[30]  Ben Collins-Sussman,et al.  The subversion project: buiding a better CVS , 2002 .

[31]  Carole A. Goble,et al.  Semantically Linking and Browsing Provenance Logs for E-science , 2004, ICSNW.

[32]  Ran Canetti,et al.  Efficient authentication and signing of multicast streams over lossy channels , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[33]  William H. Manz Legislative history of the Gramm-Leach-Bliley Act Public Law No. 106-102, 113 Stat. 1338 , 2001 .

[34]  Brian Berliner,et al.  CVS II: Parallelizing Software Dev elopment , 1998 .

[35]  Beth Plale,et al.  Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering , 2006, IPAW.

[36]  WinslettMarianne,et al.  Preventing history forgery with secure provenance , 2009 .

[37]  PlaleBeth,et al.  A survey of data provenance in e-science , 2005 .

[38]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[39]  Christian S. Collberg,et al.  Tamper Detection in Audit Logs , 2004, VLDB.

[40]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[41]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[42]  OpenSSL OpenSSL : The open source toolkit for SSL/TSL , 2002 .

[43]  Margo I. Seltzer,et al.  Passive NFS Tracing of Email and Research Workloads , 2003, FAST.

[44]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[45]  Shankar Pasupathy,et al.  Measurement and Analysis of Large-Scale Network File System Workloads , 2008, USENIX Annual Technical Conference.

[46]  Philippe Golle,et al.  Authenticating Streamed Data in the Presence of Random Packet Loss , 2001, NDSS.

[47]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[48]  Luc Moreau,et al.  The Open Provenance Model: An Overview , 2008, IPAW.

[49]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[50]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[51]  Marianne Winslett,et al.  Introducing secure provenance: problems and challenges , 2007, StorageSS '07.

[52]  Luc Moreau,et al.  Provenance-Based Auditing of Private Data Use , 2008, BCS Int. Acad. Conf..

[53]  Avishai Wool,et al.  A practical revocation scheme for broadcast encryption using smartcards , 2006, TSEC.

[54]  A. Meyer The Health Insurance Portability and Accountability Act. , 1997, Tennessee medicine : journal of the Tennessee Medical Association.

[55]  William Pugh,et al.  Skip Lists: A Probabilistic Alternative to Balanced Trees , 1989, WADS.

[56]  James Frew,et al.  Earth System Science Workbench: a data management infrastructure for earth science products , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[57]  Marianne Winslett,et al.  The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance , 2009, FAST.