Privacy Protection Workflow Publishing Under Differential Privacy

The workflow has been widely used in data quality assessment, error data location and other fields. As data sharing deepens, so does the need to share data lineages. The topology of the lineage workflow contains private information that includes the data generation process, that is, the privacy of the lineage workflow structure, which directly exposes the structure privacy leakage of the lineage workflow. There are the following deficiencies in the privacy protection methods of the lineage workflow structure: (1) The privacy protection method based on the restricted release has a weak theoretical foundation and can only qualitatively measure the privacy protection effect of the lineage workflow structure; (2) Focusing on the maintenance of the local mapping relationship of modules, the maintenance of the key path of the lineage workflow is weak. Aiming at the above problems, this paper proposes a privacy protection method PPWP-DP for the lineage workflow structure, which satisfies the differential privacy. Key path and key path priority concepts are introduced. On this basis, the θ-project projection algorithm is proposed to reduce the degree of the lineage workflow. At the same time, according to the user’s preference for key path priority, the maintenance of high priority key path reachability is achieved. The concept of oi-sequence is introduced to extract the structure characteristics of the lineage workflow, and add Laplacian noise to the oi-sequence to satisfy the differential privacy constraint. Adjust the global sensitivity of the oi-sequence after noise addition by the θ-project algorithm to reduce the Laplacian noise scale. Finally, the perturbed oi-sequence is used to reconstruct the lineage workflow for publication, which realizes the workflow privacy security and the maintenance of key path accessibility. Theoretical analysis and experiments verify the effectiveness of the proposed algorithm.

[1]  Susan B. Davidson,et al.  Provenance: Privacy and Security , 2018, Encyclopedia of Database Systems.

[2]  Margo I. Seltzer,et al.  Securing Provenance , 2008, HotSec.

[3]  David D. Jensen,et al.  Accurate Estimation of the Degree Distribution of Private Networks , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[4]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[5]  Shiyong Lu,et al.  Scientific Workflow Provenance Querying with Security Views , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[6]  Priya Mahadevan,et al.  Systematic topology analysis and generation using degree correlations , 2006, SIGCOMM 2006.

[7]  Yolanda Gil,et al.  Privacy enforcement in data analysis workflows , 2007 .

[8]  Minas Gjoka,et al.  Construction of Directed 2K Graphs , 2017, KDD.

[9]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[10]  Tova Milo,et al.  A propagation model for provenance views of public/private workflows , 2012, ICDT '13.

[11]  Sanjeev Khanna,et al.  On provenance and privacy , 2010, ICDT '11.

[12]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[13]  Sen Zhang,et al.  Generalization Based Privacy-Preserving Provenance Publishing , 2018, WISA.

[14]  Sanjeev Khanna,et al.  To Show or Not to Show in Workflow Provenance , 2013, In Search of Elegance in the Theory and Practice of Computation.

[15]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[16]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[17]  Yolanda Gil,et al.  Reasoning about the Appropriate Use of Private Data through Computational Workflows , 2010, AAAI Spring Symposium: Intelligent Information Privacy Management.

[18]  Zhang Xiaojian,et al.  An Accurate Method for Mining top-k Frequent Pattern Under Differential Privacy , 2014 .

[19]  Debmalya Panigrahi,et al.  Provenance views for module privacy , 2010, PODS.