Generalization Based Privacy-Preserving Provenance Publishing

With thriving of data sharing, demands of data provenance publishing become increasingly urgent. Data provenance describes about how data is generated and evolves with time. Data provenance has many applications, in-cluding evaluation of data quality, audit trail, replication recipes, data citation, etc. Some in-out mapping relations and related intermediate parameters in data provenance may be private. How to protect the privacy in the data provenance publishing attracts increasing attention from researchers in recent years. Existing solutions rely primarily on Γ-privacy model, hiding certain properties to solve the module’s privacy-preserving problem. However, the Γ-privacy model has the following disadvantages: (1) The attribute domains are limited. (2) It’s difficult to set consistent Γ value for the workflow. (3) The attribute selection strategy is unreasonable. Concerning these problems, a novel privacy-preserving provenance model is devised to balance the tradeoff between privacy-preserving and utility of data provenance. The devised model applies the generalization and introduces the generalized level. Furthermore, an effective privacy-preserving provenance publishing method based on generalization is proposed to achieve the privacy security in the data provenance publishing. Finally, theoretical analysis and experimental results testifies the effectiveness of our solution.

[1]  Sanjeev Khanna,et al.  On provenance and privacy , 2010, ICDT '11.

[2]  John R. Yates,et al.  CEBS object model for systems biology data, SysBio-OM , 2004, Bioinform..

[3]  Susan B. Davidson,et al.  Privacy issues in scientific workflow provenance , 2010, Wands '10.

[4]  Aoying Zhou,et al.  A Survey on Management of Data Provenance: A Survey on Management of Data Provenance , 2010 .

[5]  Feng Li,et al.  Privacy Preservation in Database Applications: A Survey: Privacy Preservation in Database Applications: A Survey , 2009 .

[6]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[7]  Yogesh L. Simmhan,et al.  A Framework for Collecting Provenance in Data-Centric Scientific Workflows , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[8]  Debmalya Panigrahi,et al.  Preserving Module Privacy in Workflow Provenance , 2010, ArXiv.

[9]  Gao Ming,et al.  A Survey on Management of Data Provenance , 2010 .

[10]  Shiyong Lu,et al.  Scientific Workflow Provenance Querying with Security Views , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[11]  Hoda M. O. Mokhtar,et al.  A Comprehensive Sanitization Approach for Workflow Provenance Graphs , 2016, EDBT/ICDT Workshops.

[12]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[13]  Debmalya Panigrahi,et al.  Provenance views for module privacy , 2010, PODS.

[14]  Bertram Ludäscher,et al.  Scientific workflow management and the Kepler system: Research Articles , 2006 .

[15]  Paolo Missier,et al.  Provenance graph abstraction by node grouping , 2013 .