A roadmap for privacy-enhanced secure data provenance

The notion of data provenance was formally introduced a decade ago and has since been investigated, but mainly from a functional perspective, which follows the historical pattern of introducing new technologies with the expectation that security and privacy can be added later. Despite very recent interests from the cyber security community on some specific aspects of data provenance, there is no long-haul, overarching, systematic framework for the security and privacy of provenance. The importance of secure provenance R&D has been emphasized in the recent report on Federal game-changing R&D for cyber security especially with respect to the theme of Tailored Trustworthy Spaces. Secure data provenance can significantly enhance data trustworthiness, which is crucial to various decision-making processes. Moreover, data provenance can facilitate accountability and compliance (including compliance with privacy preferences and policies of relevant users), can be an important factor in access control and usage control decisions, and can be valuable in data forensics. Along with these potential benefits, data provenance also poses a number of security and privacy challenges. For example, sometimes provenance needs to be confidential so it is visible only to properly authorized users, and we also need to protect the identity of entities in the provenance from exposure. We thus need to achieve high assurance of provenance without comprising privacy of those in the chain that produced the data. Moreover, if we expect voluntary large-scale participation in provenance-aware applications, we must assure that the privacy of the individuals or organizations involved will be maintained. It is incumbent on the cyber security community to develop a technical and scientific framework to address the security and privacy challenges so that our society can gain maximum benefit from this technology. In this paper, we discuss a framework of theoretical foundations, models, mechanisms and architectures that allow applications to benefit from privacy-enhanced and secure use of provenance in a modular fashion. After introducing the main components of such a framework and the notion of provenance life cycle, we discuss in details research questions and issues concerning each such component and related approaches.

[1]  Elisa Bertino,et al.  Managing Risks in RBAC Employed Distributed Environments , 2007, OTM Conferences.

[2]  Johannes De Smedt,et al.  On the Move to Meaningful Internet Systems , 2014 .

[3]  James A. Hendler,et al.  Information accountability , 2008, CACM.

[4]  Sanjiva Prasad,et al.  FST TCS 2000: Foundations of Software Technology and Theoretical Computer Science , 2000, Lecture Notes in Computer Science.

[5]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[6]  Dr. Jim Finley May, 2010 , 2010, The Lancet Neurology.

[7]  Zahir Tari,et al.  Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II , 2007 .

[8]  Luc Moreau,et al.  4th International Provenance and Annotation Workshop, Santa Barbara , 2012 .

[9]  Margo I. Seltzer,et al.  Provenance: a future history , 2009, OOPSLA Companion.

[10]  Jerry den Hartog,et al.  An audit logic for accountability , 2005, Sixth IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY'05).

[11]  Bertram Ludäscher,et al.  A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows , 2006, IPAW.

[12]  Zachary G. Ives,et al.  Reconciling while tolerating disagreement in collaborative data sharing , 2006, SIGMOD Conference.

[13]  Aggelos Kiayias,et al.  Traceable Signatures , 2004, EUROCRYPT.

[14]  Beth Plale,et al.  Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering , 2006, IPAW.

[15]  Hovav Shacham,et al.  Aggregate and Verifiably Encrypted Signatures from Bilinear Maps , 2003, EUROCRYPT.

[16]  Simon Miles Electronically Querying for the Provenance of Entities , 2006, IPAW.

[17]  Shouhuai Xu,et al.  SocialClouds: Concept, Security Architecture and Some Mechanisms , 2009, INTRUST.

[18]  Shouhuai Xu,et al.  Privacy Preserving Data Mining within Anonymous Credential Systems , 2008, SCN.

[19]  Paul T. Groth,et al.  A model of process documentation to determine provenance in mash-ups , 2009, TOIT.

[20]  Shouhuai Xu,et al.  An Access Control Language for a General Provenance Model , 2009, Secure Data Management.

[21]  Shouhuai Xu,et al.  Leak-free mediated group signatures , 2009, J. Comput. Secur..

[22]  Mihir Bellare,et al.  Multi-signatures in the plain public-Key model and a general forking lemma , 2006, CCS '06.

[23]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[24]  Shuai Li,et al.  Facet: Streaming over Videoconferencing for Censorship Circumvention , 2014, WPES.

[25]  Sabrina De Capitani di Vimercati,et al.  Proceedings of the 13th ACM conference on Computer and communications security , 2006, CCS 2006.

[26]  Jong Hyuk Park,et al.  Advances in information security and assurance : Third International Conference and Workshops, ISA 2009, Seoul, Korea, June 25-27, 2009 : proceedings , 2009 .

[27]  Jing Zhang,et al.  Do You Know Where Your Data's Been? - Tamper-Evident Database Provenance , 2009, Secure Data Management.

[28]  Abha Moitra,et al.  Data Provenance architecture to support Information Assurance in a Multi-Level Secure Environment , 2009, MILCOM 2009 - 2009 IEEE Military Communications Conference.

[29]  David M. Eyers,et al.  Using trust and risk in role-based access control policies , 2004, SACMAT '04.

[30]  Murat Kantarcioglu,et al.  Privacy-preserving data mining in the malicious model , 2008, Int. J. Inf. Comput. Secur..

[31]  Paul T. Groth,et al.  The provenance of electronic data , 2008, CACM.

[32]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[33]  Brent Waters,et al.  Efficient Identity-Based Encryption Without Random Oracles , 2005, EUROCRYPT.

[34]  Susan Hohenberger,et al.  Proxy re-signatures: new definitions, algorithms, and applications , 2005, CCS '05.

[35]  Jennifer Golbeck,et al.  Combining Provenance with Trust in Social Networks for Semantic Web Content Filtering , 2006, IPAW.

[36]  David Benyon,et al.  Proceedings of the 5th conference on Designing interactive systems: processes, practices, methods, and techniques , 2004 .

[37]  Jeffrey D. Ullman,et al.  Principles of Database Systems , 1980 .

[38]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[39]  Nick Koudas,et al.  The design of a query monitoring system , 2009, TODS.

[40]  Salil P. Vadhan,et al.  Theory of Cryptography , 2016, Lecture Notes in Computer Science.

[41]  Luc Moreau,et al.  Proceedings of International Provenance and Annotation Workshop (IPAW) , 2008 .

[42]  2010 International Symposium on Collaborative Technologies and Systems, CTS 2010, Chicago, Illinois, USA, May 17-21, 2010 , 2010, CTS.

[43]  Thomas Heinis,et al.  Efficient lineage tracking for scientific workflows , 2008, SIGMOD Conference.

[44]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[45]  Paul T. Groth The origin of data : enabling the determination of provenance in multi-institutional scientific systems through the documentation of processes , 2007 .

[46]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[47]  Shouhuai Xu,et al.  Trustworthy Information: Concepts and Mechanisms , 2010, WAIM.

[48]  Allison Bishop,et al.  Fully Secure Functional Encryption: Attribute-Based Encryption and (Hierarchical) Inner Product Encryption , 2010, EUROCRYPT.

[49]  Bertram Ludäscher,et al.  Provenance in Scientific Workflow Systems , 2007, IEEE Data Eng. Bull..

[50]  Susan B. Davidson,et al.  Towards a Model of Provenance and User Views in Scientific Workflows , 2006, DILS.

[51]  Kouichi Sakurai,et al.  Grouping Provenance Information to Improve Efficiency of Access Control , 2009, ISA.

[52]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[53]  Andrea C. Arpaci-Dusseau,et al.  Antfarm: Tracking Processes in a Virtual Machine Environment , 2006, USENIX Annual Technical Conference, General Track.

[54]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[55]  Benoît Libert,et al.  Multi-use unidirectional proxy re-signatures , 2008, CCS.

[56]  Yogesh L. Simmhan,et al.  Karma2: Provenance Management for Data-Driven Workflows , 2008, Int. J. Web Serv. Res..

[57]  James D. Myers,et al.  A provenance-aware virtual sensor system using the Open Provenance Model , 2010, 2010 International Symposium on Collaborative Technologies and Systems.

[58]  Elisa Bertino,et al.  Query Processing Techniques for Compliance with Data Confidence Policies , 2009, Secure Data Management.

[59]  Marianne Winslett,et al.  Introducing secure provenance: problems and challenges , 2007, StorageSS '07.

[60]  Prashant J. Shenoy,et al.  Resource overbooking and application profiling in a shared Internet hosting platform , 2009, TOIT.

[61]  Simon Miles,et al.  Provenance in Agent-Mediated Healthcare Systems , 2006, IEEE Intelligent Systems.

[62]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[63]  Brent Waters,et al.  Attribute-based encryption for fine-grained access control of encrypted data , 2006, CCS '06.

[64]  Marianne Winslett,et al.  The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance , 2009, FAST.

[65]  Pierangela Samarati,et al.  Proceedings of the 8th ACM conference on Computer and Communications Security , 1998, CCS 2001.

[66]  Bhavani M. Thuraisingham,et al.  A cloud-based RDF policy engine for assured information sharing , 2012, SACMAT '12.

[67]  Laks V. S. Lakshmanan,et al.  Proceedings of the 2008 ACM SIGMOD international conference on Management of data , 2008, SIGMOD 2008.

[68]  Andrea C. Arpaci-Dusseau,et al.  Geiger: monitoring the buffer cache in a virtual machine environment , 2006, ASPLOS XII.

[69]  Lauren Wood 技術解説 IEEE Internet Computing , 1999 .

[70]  Paul T. Groth,et al.  An Architecture for Provenance Systems , 2006 .

[71]  Carole A. Goble,et al.  myGrid: personalised bioinformatics on the information grid , 2003, ISMB.

[72]  Murat Kantarcioglu,et al.  Proceedings of the 17th ACM symposium on Access Control Models and Technologies , 2012, SACMAT 2012.

[73]  Tharam S. Dillon,et al.  On the Move to Meaningful Internet Systems: OTM 2011 Workshops - Confederated International Workshops and Posters: EI2N+NSF ICE, ICSP+INBAST, ISDE, ORM, OTMA, SWWS+MONET+SeDeS, and VADER 2011, Hersonissos, Crete, Greece, October 17-21, 2011. Proceedings , 2011, OTM Workshops.

[74]  James A. Landay,et al.  Privacy risk models for designing privacy-sensitive ubiquitous computing systems , 2004, DIS '04.

[75]  Yurdaer N. Doganata,et al.  Business Provenance - A Technology to Increase Traceability of End-to-End Operations , 2008, OTM Conferences.

[76]  Proceedings of the 2006 USENIX Annual Technical Conference, Boston, MA, USA, May 30 - June 3, 2006 , 2006, USENIX Annual Technical Conference, General Track.

[77]  Rafail Ostrovsky,et al.  Attribute-based encryption with non-monotonic access structures , 2007, CCS '07.

[78]  Brent Waters,et al.  Secure attribute-based systems , 2010, J. Comput. Secur..

[79]  James Cheney,et al.  Program Slicing and Data Provenance , 2007, IEEE Data Eng. Bull..

[80]  S. Lambowitz,et al.  September, 2009 , 2009, The Lancet Neurology.

[81]  Fabian Monrose,et al.  Trail of bytes: efficient support for forensic analysis , 2010, CCS '10.

[82]  Willem Jonker,et al.  Secure Data Management , 2012, Lecture Notes in Computer Science.

[83]  Amit P. Sheth,et al.  International Symposium on Collaborative Technologies and Systems (CTS 2009) , 2009 .

[84]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[85]  Niels Provos,et al.  Proceedings of the 3rd conference on Hot topics in security , 2008 .

[86]  Zahir Tari,et al.  On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE , 2002, Lecture Notes in Computer Science.

[87]  Yehuda Lindell,et al.  More Efficient Constant-Round Multi-Party Computation from BMR and SHE , 2016, IACR Cryptol. ePrint Arch..

[88]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[89]  Brent Waters,et al.  Conjunctive, Subset, and Range Queries on Encrypted Data , 2007, TCC.

[90]  Jennifer Golbeck,et al.  A Semantic Web approach to the provenance challenge , 2008 .

[91]  Xiaohui Liang,et al.  Secure provenance: the essential of bread and butter of data forensics in cloud computing , 2010, ASIACCS '10.

[92]  Amit P. Sheth,et al.  Semantic Provenance for eScience: Managing the Deluge of Scientific Data , 2008, IEEE Internet Computing.

[93]  S. Globali,et al.  IEEE INTELLIGENT SYSTEMS , 2022, IEEE MultiMedia.

[94]  Bhavani M. Thuraisingham,et al.  A language for provenance access control , 2011, CODASPY '11.