Privacy issues in scientific workflow provenance

A scientific workflow often deals with proprietary modules as well as private or confidential data, such as health or medical information. Hence providing exact answers to provenance queries over all executions of the workflow may reveal private information. In this paper we first study the potential privacy issues in a scientific workflow -- module privacy, data privacy, and provenance privacy, and frame several natural questions: (i) can we formally analyze module, data or provenance privacy giving provable privacy guarantees for an unlimited/bounded number of provenance queries? (ii) how can we answer provenance queries, providing as much information as possible to the user while still guaranteeing the required privacy? Then we look at module privacy in detail and propose a formal model from our recent work in [11]. Finally we point to several directions for future work.

[1]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[2]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[3]  Bertram Ludäscher,et al.  Actor-Oriented Design of Scientific Workflows , 2005, ER.

[4]  Shiyong Lu,et al.  Scientific Workflow Provenance Querying with Security Views , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[5]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[6]  Cláudio T. Silva,et al.  Managing Rapidly-Evolving Scientific Workflows , 2006, IPAW.

[7]  Susan B. Davidson,et al.  Detecting and resolving unsound workflow views for correct provenance analysis , 2009, SIGMOD Conference.

[8]  Cynthia Dwork,et al.  The Differential Privacy Frontier (Extended Abstract) , 2009, TCC.

[9]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[10]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Andrew P. Martin,et al.  Trusted Computing and Provenance: Better Together , 2010, TaPP.

[12]  Rajeev Motwani,et al.  Towards robustness in query auditing , 2006, VLDB.

[13]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[14]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[15]  Marianne Winslett,et al.  Introducing secure provenance: problems and challenges , 2007, StorageSS '07.

[16]  Yolanda Gil,et al.  Privacy enforcement in data analysis workflows , 2007 .

[17]  Alina Campan,et al.  A Clustering Approach for Data and Structural Anonymity in Social Networks , 2008 .

[18]  Sanjeev Khanna,et al.  Optimizing user views for workflows , 2009, ICDT '09.

[19]  Margo I. Seltzer,et al.  Securing Provenance , 2008, HotSec.

[20]  Debmalya Panigrahi,et al.  Preserving Module Privacy in Workflow Provenance , 2010, ArXiv.

[21]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[23]  Rajeev Motwani,et al.  Link Privacy in Social Networks , 2008, ICDE.

[24]  Carmem S. Hara,et al.  Querying and Managing Provenance through User Views in Scientific Workflows , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[25]  Dan Suciu,et al.  Relationship privacy: output perturbation for queries with joins , 2009, PODS.

[26]  Yolanda Gil,et al.  Reasoning about the Appropriate Use of Private Data through Computational Workflows , 2010, AAAI Spring Symposium: Intelligent Information Privacy Management.

[27]  Rajeev Motwani,et al.  Auditing SQL Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[28]  Jon M. Kleinberg,et al.  Wherefore art thou R3579X? , 2011, Commun. ACM.